Next: Different course modules and
Up: Course Topics and Models
Previous: Foundations (Course I)
- Advanced Data Preprocessing. This unit will cover
advanced data reduction methods.
- Advanced data reduction methods:
(1) dimensionality reduction (feature or attribute subset
reduction), (2) numerosity reduction (regression, histogram,
clustering, sampling, singular value decomposition (SVD), and
discretization), and (3) data compression (lossless versus lossy
compression, Fourier and wavelet transformation, and principal
component analysis).
- Data Warehousing, OLAP, and Data Generalization:
This unit covers advanced material in data warehousing, OLAP, and
data generalization
- The multidimensional data model
- Implementations of data warehouses: data
integration, indexing OLAP data (bitmap index), efficient
processing of OLAP queries, metadata repository, data warehouse
back-end tools and utilities.
- Efficient computation of data cubes: categorization
of measures: distributive, algebraic, and holistic measures, cube
computation methods, iceberg cubes, top-down and bottom-up
computation, computing closed and approximate data cubes.
- Other data generalization approaches:
Attribute-oriented induction, mining class comparisons:
discriminating between different classes.
- Exploration of data warehouse and data mining:
Discovery-driven exploration of data cubes, complex aggregation at
multiple granularity, cube gradient analysis, from on-line
analytical processing to on-line analytical mining.
- Advanced association, correlation, and frequent pattern
analysis.
- Advanced frequent pattern mining methods:
(1) vertical format mining, (2) pattern-growth algorithm, (3) mining
closed patterns and max-patterns
- Constraint-based association mining: (1) rule- and
query-guided association mining, (2) anti-monotonicity,
monotonicity, succinctness in constrained mining, (3) convertible
constraints.
- Extensions and applications of frequent pattern
mining: (1) iceberg cube computation, (2) fascicles and semantic
data compression, (3) frequent pattern-based classification and
cluster analysis
- Advanced Classification.
- Bayesian belief networks: methods for (advanced)
choosing BBN structure and training Bayesian belief networks
- Advanced decision tree construction: (1) enhancements
to basic classification tree induction, (2) scalable algorithms for
classification tree induction, (3) integrating data warehousing
techniques and classification tree induction, (4) classification
with partially labeled data
- Neural network approach for classification: (1) a
multi-layer feed-forward neural network, (2) defining a network
topology, (3) back-propagation, (4) interpretability of
classification results.
- Kernel methods: (1) kernel logistic regression,
(2) kernel discriminant analysis, (3) advanced SVM kernel methods.
- Introduction to learning theory: PAC-learnability,
empirical, true and structural risk, VC-theory.
- Ensemble construction: Weighted voting, bagging,
weak learner, boosting, AdaBoost
- Other classification methods: (1) case-based
reasoning, (2) genetic algorithms, (3) rough set approach, (4)
fuzzy set approach
- Advanced cluster analysis.
- Grid-based clustering: A statistical information
grid approach, clustering by wavelet analysis, clustering
high-dimensional space.
- Clustering high-dimensional data:
Subspace clustering, frequent pattern-based clustering, clustering
by wavelet analysis.
- Advanced outlier analysis: Statistical-based
outlier detection, distance-based outlier detection, deviation-based
outlier detection, analysis of local outliers.
- Collaborative filtering:
- Advanced Time-Series and Sequential Data Mining. This unit covers
the advanced techniques for mining sequential data, including the
following topics.
- Similarity search in time-series analysis:
- Hidden Markov models
- Periodicity analysis: Transformation-based
approach, mining partial periodicity.
- Sequence segmentation: Hidden Markov model and
Variable Markov model for sequence segmentation.
- Sequence classification and clustering: (1)
-gram based methods, keyword-based methods; (2) (high order)
Markov chain, hidden Markov model; (3) suffix tree, probabilistic
suffix tree, and probabilistic automata.
- Mining Data Streams: This unit covers the
techniques for mining stream data, including the following topics.
- What is stream data?
- Basic tools: Chernoff bounds, reservoir sampling
- Stream sample counting and frequent pattern
analysis
- Classification of data streams
- Clustering data streams
- Online sensor data analysis
- Mining Spatial, Spatiotemporal, and Multimedia
data. This unit covers the techniques for mining spatial,
spatiotemporal, and multimedia data, including the following
topics.
- Mining spatial and spatiotemporal databases: (1)
Spatial data cube construction and spatial OLAP, (2) spatial
association and co-location analysis, (3) spatial clustering
methods, (4) spatial classification and spatial trend analysis, (5)
spatiotemporal data miming, (6) mining moving objects and
trajectories.
- Mining multimedia databases: (1) multidimensional
analysis of multimedia data, (2) similarity search in multimedia
data, (3) classification and regression analysis of multimedia data,
(4) mining association and correlation in multimedia data, (5)
clustering multimedia data
- Mining object databases: (1) multidimensional
analysis of complex objects, (2) generalization on complex
structured and semi-structured data, (3) methodology for mining
complex object databases: aggregation, approximation, and
progressive refinement.
- Mining Biological Data: This unit covers the
techniques for mining biological data, including the following
topics.
- Mining DNA, RNA, and proteins: (1) Mining motif
patterns, (2) searching homology in large databases, (3)
phylogenetic and functional prediction.
- Mining gene expression data: (1) clustering gene
expression, e.g., for gene regulatory networks, (2) classifying
gene expression, e.g., for disease-sensitive gene discovery.
- Mining mass spectrometry data
- Mining and integrating knowledge from biomedical
literature
- Mining inter-domain associations
- Text mining. This module will cover work that
applies known mining techniques to the text media, emphasizing the
new issues which arise.
- Text representation: Set-of-words, bag-of-words,
vector-space model; the issue of large raw dimensionality
- Dimensionality reduction: PCA, SVD, latent
semantic indexing
- Text clustering: agglomerative,
-means, EM; effect
of a large number of noise dimensions, partial supervision
- Feature selection in high dimensions
- Naive Bayes classification: Poor density
estimates, small-degree Bayesian belief network induction
- Discriminative learning: maximum entropy, logistic
regression, and support vector learning
- Shallow linguistics: Phrase detection,
part-of-speech tagging, named entity extraction, word sense
disambiguation
- Hypertext and Web mining. This module will cover
work that is specific to analyzing hypermedia, i.e., involving
hierarchical tagging languages and hyperlinks in conjunction with
text.
- Web modeling: The Web as an evolving,
collaborative, populist social network: aggregate graph-structure
of the Web, preferential attachment linking models and
experimental validation
- Link mining and social network analysis: Links as endorsement: PageRank and
HITS algorithms to identify authoritative Web pages; connections
with bibliometry
- The PageRank algorithm: Integrating page content
and page layout with link structure; topic-sensitive PageRanks;
Google
- Mining by exploiting text and links: Exploiting
text and links for better clustering and classification; unified
probabilistic models for text and links
- Structured data extraction: Information extraction,
exploiting markup structure to extract structured data from pages
meant for human consumption
- Multidimensional Web databases: Automatic
construction of multilayered Web information base; discovering
entities and relations on the Web (WebKB)
- Exploration and resource discovery on the Web:
reinforcement learning, other approaches
- Web usage mining and adaptive Web sites:
Reorganizing Web sites by mining log data
- Data Mining Languages, Standards, and System
Architectures. This unit covers the issues related to data
mining languages, standards, and system architectures, including
the following topics.
- Data mining primitives: what defines a data mining
task? task-relevant data, the kind of knowledge to be mined,
background knowledge: concept hierarchies, user-specified
constraints, interestingness measures, presentation of discovered
patterns
- Data mining languages, user interfaces, and
standardization efforts
- Architectures of data mining systems
- Data Mining Applications. This unit covers the
issues related to domain-specific data mining applications,
including the following topics. (Note: Some of these themes, if
concrete and good materials are available, should go into the
Foundations part as case studies.)
- Data mining for financial data analysis
- Data mining for the retail industry
- Data mining for the telecommunication industry
- Data mining for intrusion detection
- Data mining in scientific and statistical
applications
- Data mining in software engineering and computer system analysis
- Data Mining and Society. This unit covers the
issues related to social impacts of data mining, including the
following topics.
- Social impacts of data mining
- Data mining vs. data security and privacy
- Privacy-preserving data mining
- Trends in Data Mining. This unit covers the
major trends in data mining, including the following topics.
- Setting solid theoretical foundations for data
mining
- Mining deep in specific applications
- Ubiquitous and invisible data mining
- Integrated data and information systems
Next: Different course modules and
Up: Course Topics and Models
Previous: Foundations (Course I)
Gabor Melli
2010-06-01