Home / Topics

Dimensionality Reduction

Curated by: Aarti Singh

In modern datasets, the dimensionality of the input data is typically too large to measure, store, compute, transmit, visualize or interpret. This necessitates dimensionality reduction methods that can identify few of the most relevant dimensions. Dimensionality reduction methods can be categorized into feature selection methods that aim to select a subset from given features (aka coordinates, attributes or dimensions) that are most relevant and feature extraction methods that aim to identify few transformations of the given features that are most relevant. Feature extraction methods can yield more parsimonious representations than feature selection methods, however the latter lead to interpretable solutions e.g. which genes are most representative (or predictive of a disease), instead of transformations of gene expressions that are most representative (or predictive).

Feature selection based dimensionality reduction methods can be unsupervised such as uniform subsampling, column subset selection (set of columns that provide best rank-k approximation to the data matrix, e.g. using column norm or leverage score sampling), or supervised methods that score individual features based on their dependence on the output variable such as correlation, mutual information, c^2 dependence, etc. Choosing the best subset of size k is typically a combinatorial problem and hence computationally infeasible. Standard approaches either select top-k scoring features or iteratively score features extracting one at each of the k rounds. In the last decade, l1 penalized methods have become popular that balance a cost function with a convex “sparsity” penalty for selecting more features. These methods and include both supervised approaches such as l1 penalized regression and classification, and unsupervised methods such as l1 penalized clustering, as well as methods such as learning graphical model structure, which can be both supervised or unsupervised. The l1 penalized methods arecomputationally efficient and achieve nearly the same performance as combinatorial solutions under some assumptions. However, these have been mostly successful in linear settings only and optimal nonlinear feature selection remains open.

Feature extraction based dimensionality methods form few linear or nonlinear transformation of the original features, thus embedding the points in a low-dimensional space. Linear methods include random projections and Multi Dimensional Scaling (MDS) that seek to preserve pairwise distances, Principal Component Analysis (PCA) that maximizes covariance, Canonical Correlation Analysis (CCA) that maximizes cross-covariance, and Independent Component Analysis (ICA). Nonlinear methods include kernelized PCA or CCA, as well as methods such as Isomap, Locally Linear Embedding (LLE), Laplacian and Diffusion eigenmaps, etc. that preserve local structure (local distances or subspaces). The linear methods can be effectively combined with l1 penalties to yield feature selection along with feature extraction, i.e. find embeddings that only involve a transformation of few original features. Examples include sparse PCA, sparse CCA, etc. Dimensionality reduction can also be achieved by extracting features relevant for prediction in a supervised manner, examples of this include linear discriminant analysis, topic modeling, and hidden layers in neural networks.


Dimension Reduction: A Guided Tour. Christopher J. C. Burges, Foundations and Trends in Machine Learning, Vol. 2, No. 4 (2009) 275–365.

A survey of dimension reduction techniques. I. Fodor, Center for Applied Scientific Computing, Lawrence Livermore National, Technical Report UCRL-ID- 148494, (2002).

Linear Dimensionality Reduction: Survey, Insights, and Generalizations. John P. Cunningham and Zoubin Ghahramani. Journal of Machine Learning Research, Vol. 16 (2015) 2859-2900.

Feature selection for classification: A review . Jiliang Tang, Salem Alelyani, and Huan Liu. Data Classification: Algorithms and Applications, (2014) page 37.

Related KDD2016 Papers

Title & Authors
Mining Subgroups with Exceptional Transition Behavior
Author(s): Florian Lemmerich*, Gesis; Martin Becker, University of Würzburg; Philipp Singer, Gesis; Denis Helic, TU Graz; Andreas Hotho, University of Wuerzburg; Markus Strohmaier,
Bayesian Inference of Arrival Rate and Substitution Behavior from Sales Transaction Data with Stocko
Author(s): Ben Letham*, Facebook; Lydia Letham, MIT; Cynthia Rudin, MIT
FUSE: Full Spectral Clustering
Author(s): Wei Ye*, University of Munich; Sebastian Goebl, University of Munich; Claudia Plant, ; Christian Boehm, University of Munich
Infinite Ensemble for Image Clustering
Author(s): Hongfu Liu*, Northeastern University; Ming Shao, Northeastern University; Sheng Li, Northeastern University; Yun Fu, Northeastern University
MANTRA: A Scalable Approach to Mining Temporally Anomalous Sub-trajectories
Author(s): Prithu Banerjee*, UBC; Pranali Yawalkar, IIT Madras; Sayan Ranu, IIT Madras
Generalized Hierarchical Sparse Model for Arbitrary-Order Interactive Antigenic Sites Identification
Author(s): Lei Han*, Rutgers University; Yu Zhang, Hong Kong University of Science and Technology; Xiu-Feng Wan, Mississippi State University; Tong Zhang, Rutgers University
Keeping it Short and Simple: Summarising Complex Event Sequences with Multivariate Patterns
Author(s): Roel Bertens*, Universiteit Utrecht; Jilles Vreeken, Max-Planck Institute for Informatics and Saarland University; Arno Siebes,
Hierarchical Incomplete Multi-source Feature Learning for Spatiotemporal Event Forecasting
Author(s): Liang Zhao*, VT; Jieping Ye, University of Michigan at Ann Arbor; Feng Chen, SUNY Albany; Chang-Tien Lu, Virginia Tech; Naren Ramakrishnan, Virginia Tech
Accelerating Online CP Decompositions for Higher Order Tensors
Author(s): Shuo Zhou*, University of melbourne; Nguyen Vinh, University of Melbourne; James Bailey, ; Yunzhe Jia, University of Melbourne; Ian Davidson, University of California-Davis
Efficient Shift-Invariant Dictionary Learning
Author(s): Guoqing Zheng*, Carnegie Mellon University; Yiming Yang, ; Jaime Carbonell,
Targeted Topic Modeling for Focused Analysis
Author(s): Shuai Wang*, University of Illinois at Chicago; Zhiyuan Chen, UIC; Geli Fei, Univ of Illinois at Chicago; Bing Liu, Univ of Illinois at Chicago; Sherry Emery, University of Illinois at Chicago
Rebalancing Bike Sharing Systems: A Multi-source Data Smart Optimization
Author(s): Junming Liu, Rutgers University; Leilei Sun, ; Hui Xiong*, Rutgers; Weiwei Chen,
Data-driven Automatic Treatment Regimen Development and Recommendation
Author(s): Leilei Sun*, Dalian University of Technolog; Chuanren Liu, Drexel University; Chonghui Guo, ; Hui Xiong, Rutgers; Yanming Xie,
Overcoming key weaknesses of Distance-based Neighbourhood Methods using a Data Dependent Dissimilari
Author(s): Ting Kai Ming*, Federation University; YE ZHU, Monash University; Mark Carman, Monash University; Yue Zhu, Nanjing University
Dynamic Clustering of Streaming Short Documents
Author(s): Shangsong Liang*, University College London; Emine Yilmaz, University College London; Evangelos Kanoulas, University of Amsterdam
Continuous Experience-aware Language Model
Author(s): Subhabrata Mukherjee*, Max Planck Informatics; Stephan Günnemann, Technical University of Munich; Gerhard Weikum, Max Planck Institute for Informatics
Structured Doubly Stochastic Matrix for Graph Based Clustering
Author(s): Xiaoqian Wang, Univ. of Texas at Arlington; Feiping Nie, University of Texas at Arlington; Heng Huang*, Univ. of Texas at Arlington
Streaming-LDA: A Copula-based Approach to Modeling Topic Dependencies in Document Streams
Author(s): Hesam Amoualian*, University Grenoble Alps; Marianne Clausel, University of Grenoble Alps; Eric Gaussier, University of Grenoble Alps; Massih-Reza Amini, University of Grenoble Alps
Goal-Directed Inductive Matrix Completion
Author(s): Si Si*, Ut austin; Kai-Yang Chiang, UT Austin; Cho-Jui Hsieh, UT Austin; Nikhil Rao, Technicolor Research; Inderjit Dhillon, UTexas
A Subsequence Interleaving Model for Sequential Pattern Mining
Author(s): Jaroslav Fowkes*, University of Edinburgh; Charles Sutton, University of Edinburgh
node2vec: Scalable Feature Learning for Networks
Author(s): Aditya Grover*, Stanford University; Jure Leskovec, Stanford University
Distributing the Stochastic Gradient Sampler for Large-Scale LDA
Author(s): Yuan Yang*, Beihang University; Jianfei Chen, Tsinghua University; Jun Zhu,
Skinny-dip: Clustering in a Sea of Noise
Author(s): Samuel Maurus*, Helmholtz Zentrum München; Claudia Plant
City-Scale Map Creation and Updating using GPS Collections
Author(s): Chen Chen*, Stanford University; Cewu Lu, Stanford University; Qixing Huang, Stanford University; Dimitrios Gunopulos, ; Leonidas Guibas, Stanford University; Qiang Yang, HKUST
Positive-Unlabeled Learning in Streaming Networks
Author(s): Shiyu Chang*, UIUC; Yang Zhang, UIUC; Jiliang Tang, Yahoo Labs; Dawei Yin, ; Yi Chang, Yahoo! Labs; Mark Hasegawa-Johnson, UIUC; Thomas Huang, UIUC
How to Compete Online for News Audience: Modeling Words that Attract Clicks
Author(s): Joon Hee Kim*, KAIST; Amin Mantrach, Yahoo! Research; Alex Jaimes, Yahoo!; Alice Oh, Korea Advanced Institute of Science and Technology
A Text Clustering Algorithm Using an Online Clustering Scheme for Initialization
Author(s): Jianhua Yin*, Tsinghua University; Jianyong Wang,
AnyDBC: An Efficient Anytime Density-based Clustering Algorithm for Very Large Complex Datasets
Author(s): Son Mai*, Aarhus University; Ira Assent, ; Martin Storgaard, Aarhus University
Online Feature Selection: A Limited-Memory Substitution Algorithm and its Asynchronous Parallel Vari
Author(s): Haichuan Yang*, University of Rochester; Ryohei Fujimaki, NEC Laboratories America; Yukitaka Kusumura, NEC lab; Ji Liu, University of Rochester
Subjectively Interesting Component Analysis: Data Projections that Contrast with Prior Expectations
Author(s): Bo Kang*, Ghent University; Jefrey Lijffijt, Ghent University; Raul Santos-Rodriguez, University of Bristol; Tijl De Bie, Ghen University
Safe Pattern Pruning: An Efficient Approach for Predictive Pattern Mining
Author(s): Kazuya Nakagawa, Nagoya Institute of Technology; Shinya Suzumura, Nagoya Institute of Technology; Masayuki Karasuyama, ; Koji Tsuda, University of Tokyo; Ichiro Takeuchi*, Nagoya Institute of Technology Japan