Dimensionality Reduction
Curated by: Aarti Singh
In modern datasets, the dimensionality of the input data is typically too large to measure, store, compute, transmit, visualize or interpret. This necessitates dimensionality reduction methods that can identify few of the most relevant dimensions. Dimensionality reduction methods can be categorized into feature selection methods that aim to select a subset from given features (aka coordinates, attributes or dimensions) that are most relevant and feature extraction methods that aim to identify few transformations of the given features that are most relevant. Feature extraction methods can yield more parsimonious representations than feature selection methods, however the latter lead to interpretable solutions e.g. which genes are most representative (or predictive of a disease), instead of transformations of gene expressions that are most representative (or predictive).
Feature selection based dimensionality reduction methods can be unsupervised such as uniform subsampling, column subset selection (set of columns that provide best rank-k approximation to the data matrix, e.g. using column norm or leverage score sampling), or supervised methods that score individual features based on their dependence on the output variable such as correlation, mutual information, c^2 dependence, etc. Choosing the best subset of size k is typically a combinatorial problem and hence computationally infeasible. Standard approaches either select top-k scoring features or iteratively score features extracting one at each of the k rounds. In the last decade, l1 penalized methods have become popular that balance a cost function with a convex “sparsity” penalty for selecting more features. These methods and include both supervised approaches such as l1 penalized regression and classification, and unsupervised methods such as l1 penalized clustering, as well as methods such as learning graphical model structure, which can be both supervised or unsupervised. The l1 penalized methods arecomputationally efficient and achieve nearly the same performance as combinatorial solutions under some assumptions. However, these have been mostly successful in linear settings only and optimal nonlinear feature selection remains open.
Feature extraction based dimensionality methods form few linear or nonlinear transformation of the original features, thus embedding the points in a low-dimensional space. Linear methods include random projections and Multi Dimensional Scaling (MDS) that seek to preserve pairwise distances, Principal Component Analysis (PCA) that maximizes covariance, Canonical Correlation Analysis (CCA) that maximizes cross-covariance, and Independent Component Analysis (ICA). Nonlinear methods include kernelized PCA or CCA, as well as methods such as Isomap, Locally Linear Embedding (LLE), Laplacian and Diffusion eigenmaps, etc. that preserve local structure (local distances or subspaces). The linear methods can be effectively combined with l1 penalties to yield feature selection along with feature extraction, i.e. find embeddings that only involve a transformation of few original features. Examples include sparse PCA, sparse CCA, etc. Dimensionality reduction can also be achieved by extracting features relevant for prediction in a supervised manner, examples of this include linear discriminant analysis, topic modeling, and hidden layers in neural networks.
Reference
Dimension Reduction: A Guided Tour. Christopher J. C. Burges, Foundations and Trends in Machine Learning, Vol. 2, No. 4 (2009) 275–365.
A survey of dimension reduction techniques. I. Fodor, Center for Applied Scientific Computing, Lawrence Livermore National, Technical Report UCRL-ID- 148494, (2002).
Linear Dimensionality Reduction: Survey, Insights, and Generalizations. John P. Cunningham and Zoubin Ghahramani. Journal of Machine Learning Research, Vol. 16 (2015) 2859-2900.
Feature selection for classification: A review . Jiliang Tang, Salem Alelyani, and Huan Liu. Data Classification: Algorithms and Applications, (2014) page 37.
Related KDD2016 Papers
Title & Authors |
---|
Infinite Ensemble for Image Clustering Author(s): Hongfu Liu*, Northeastern University; Ming Shao, Northeastern University; Sheng Li, Northeastern University; Yun Fu, Northeastern University |
MANTRA: A Scalable Approach to Mining Temporally Anomalous Sub-trajectories Author(s): Prithu Banerjee*, UBC; Pranali Yawalkar, IIT Madras; Sayan Ranu, IIT Madras |
Generalized Hierarchical Sparse Model for Arbitrary-Order Interactive Antigenic Sites Identification Author(s): Lei Han*, Rutgers University; Yu Zhang, Hong Kong University of Science and Technology; Xiu-Feng Wan, Mississippi State University; Tong Zhang, Rutgers University |
Keeping it Short and Simple: Summarising Complex Event Sequences with Multivariate Patterns Author(s): Roel Bertens*, Universiteit Utrecht; Jilles Vreeken, Max-Planck Institute for Informatics and Saarland University; Arno Siebes, |
Hierarchical Incomplete Multi-source Feature Learning for Spatiotemporal Event Forecasting Author(s): Liang Zhao*, VT; Jieping Ye, University of Michigan at Ann Arbor; Feng Chen, SUNY Albany; Chang-Tien Lu, Virginia Tech; Naren Ramakrishnan, Virginia Tech |
Accelerating Online CP Decompositions for Higher Order Tensors Author(s): Shuo Zhou*, University of melbourne; Nguyen Vinh, University of Melbourne; James Bailey, ; Yunzhe Jia, University of Melbourne; Ian Davidson, University of California-Davis |
Efficient Shift-Invariant Dictionary Learning Author(s): Guoqing Zheng*, Carnegie Mellon University; Yiming Yang, ; Jaime Carbonell, |
Targeted Topic Modeling for Focused Analysis Author(s): Shuai Wang*, University of Illinois at Chicago; Zhiyuan Chen, UIC; Geli Fei, Univ of Illinois at Chicago; Bing Liu, Univ of Illinois at Chicago; Sherry Emery, University of Illinois at Chicago |
Rebalancing Bike Sharing Systems: A Multi-source Data Smart Optimization Author(s): Junming Liu, Rutgers University; Leilei Sun, ; Hui Xiong*, Rutgers; Weiwei Chen, |
Data-driven Automatic Treatment Regimen Development and Recommendation Author(s): Leilei Sun*, Dalian University of Technolog; Chuanren Liu, Drexel University; Chonghui Guo, ; Hui Xiong, Rutgers; Yanming Xie, |
Overcoming key weaknesses of Distance-based Neighbourhood Methods using a Data Dependent Dissimilari Author(s): Ting Kai Ming*, Federation University; YE ZHU, Monash University; Mark Carman, Monash University; Yue Zhu, Nanjing University |
Dynamic Clustering of Streaming Short Documents Author(s): Shangsong Liang*, University College London; Emine Yilmaz, University College London; Evangelos Kanoulas, University of Amsterdam |
Continuous Experience-aware Language Model Author(s): Subhabrata Mukherjee*, Max Planck Informatics; Stephan Günnemann, Technical University of Munich; Gerhard Weikum, Max Planck Institute for Informatics |
Structured Doubly Stochastic Matrix for Graph Based Clustering Author(s): Xiaoqian Wang, Univ. of Texas at Arlington; Feiping Nie, University of Texas at Arlington; Heng Huang*, Univ. of Texas at Arlington |
Streaming-LDA: A Copula-based Approach to Modeling Topic Dependencies in Document Streams Author(s): Hesam Amoualian*, University Grenoble Alps; Marianne Clausel, University of Grenoble Alps; Eric Gaussier, University of Grenoble Alps; Massih-Reza Amini, University of Grenoble Alps |
Mining Subgroups with Exceptional Transition Behavior Author(s): Florian Lemmerich*, Gesis; Martin Becker, University of Würzburg; Philipp Singer, Gesis; Denis Helic, TU Graz; Andreas Hotho, University of Wuerzburg; Markus Strohmaier, |
Bayesian Inference of Arrival Rate and Substitution Behavior from Sales Transaction Data with Stocko Author(s): Ben Letham*, Facebook; Lydia Letham, MIT; Cynthia Rudin, MIT |
FUSE: Full Spectral Clustering Author(s): Wei Ye*, University of Munich; Sebastian Goebl, University of Munich; Claudia Plant, ; Christian Boehm, University of Munich |
A Subsequence Interleaving Model for Sequential Pattern Mining Author(s): Jaroslav Fowkes*, University of Edinburgh; Charles Sutton, University of Edinburgh |
node2vec: Scalable Feature Learning for Networks Author(s): Aditya Grover*, Stanford University; Jure Leskovec, Stanford University |
Distributing the Stochastic Gradient Sampler for Large-Scale LDA Author(s): Yuan Yang*, Beihang University; Jianfei Chen, Tsinghua University; Jun Zhu, |
Skinny-dip: Clustering in a Sea of Noise Author(s): Samuel Maurus*, Helmholtz Zentrum München; Claudia Plant |
City-Scale Map Creation and Updating using GPS Collections Author(s): Chen Chen*, Stanford University; Cewu Lu, Stanford University; Qixing Huang, Stanford University; Dimitrios Gunopulos, ; Leonidas Guibas, Stanford University; Qiang Yang, HKUST |
Positive-Unlabeled Learning in Streaming Networks Author(s): Shiyu Chang*, UIUC; Yang Zhang, UIUC; Jiliang Tang, Yahoo Labs; Dawei Yin, ; Yi Chang, Yahoo! Labs; Mark Hasegawa-Johnson, UIUC; Thomas Huang, UIUC |
How to Compete Online for News Audience: Modeling Words that Attract Clicks Author(s): Joon Hee Kim*, KAIST; Amin Mantrach, Yahoo! Research; Alex Jaimes, Yahoo!; Alice Oh, Korea Advanced Institute of Science and Technology |
A Text Clustering Algorithm Using an Online Clustering Scheme for Initialization Author(s): Jianhua Yin*, Tsinghua University; Jianyong Wang, |
AnyDBC: An Efficient Anytime Density-based Clustering Algorithm for Very Large Complex Datasets Author(s): Son Mai*, Aarhus University; Ira Assent, ; Martin Storgaard, Aarhus University |
Goal-Directed Inductive Matrix Completion Author(s): Si Si*, Ut austin; Kai-Yang Chiang, UT Austin; Cho-Jui Hsieh, UT Austin; Nikhil Rao, Technicolor Research; Inderjit Dhillon, UTexas |
Online Feature Selection: A Limited-Memory Substitution Algorithm and its Asynchronous Parallel Vari Author(s): Haichuan Yang*, University of Rochester; Ryohei Fujimaki, NEC Laboratories America; Yukitaka Kusumura, NEC lab; Ji Liu, University of Rochester |
Subjectively Interesting Component Analysis: Data Projections that Contrast with Prior Expectations Author(s): Bo Kang*, Ghent University; Jefrey Lijffijt, Ghent University; Raul Santos-Rodriguez, University of Bristol; Tijl De Bie, Ghen University |
Safe Pattern Pruning: An Efficient Approach for Predictive Pattern Mining Author(s): Kazuya Nakagawa, Nagoya Institute of Technology; Shinya Suzumura, Nagoya Institute of Technology; Masayuki Karasuyama, ; Koji Tsuda, University of Tokyo; Ichiro Takeuchi*, Nagoya Institute of Technology Japan |