Home / Topics


Curated by: Ian Davidson

Cluster analysis or clustering aims to take a collection of objects and divide them into a number of different groups such that instances in the same group (cluster) are similar to each other and dis-similar to those in other groups/clusters. It is extensively used in many domains including image analysis, information retrieval and bioinformatics. Clustering is traditionally inherently exploratory in that it takes no human guidance and aims to uncover the underlying structure in the data. Recent innovations include adding supervision (semi-supervised clustering), constraints (constrained clustering) and extensions to handle complex data such as graphs, evolving data and multi-view data.

The survey of classic methods is given in [1] with a perspective on challenge and directions given in [2]. A talk based on [2] is freely available: http://videolectures.net/ecmlpkdd08_jain_dcyb/?q=anil%20jain

Lesson 2 of the this MOOC covers many traditional clustering methods https://www.class-central.com/mooc/1848/udacity-machine-learning-unsupervised-learning.

[1] Jain, Anil K., M. Narasimha Murty, and Patrick J. Flynn. “Data clustering: a review.” ACM computing surveys (CSUR) 31.3 (1999): 264-323.

[2] Jain, Anil K. “Data clustering: 50 years beyond K-means.” Pattern recognition letters 31.8 (2010): 651-666.

Related KDD2016 Papers

Title & Authors
City-Scale Map Creation and Updating using GPS Collections
Author(s): Chen Chen*, Stanford University; Cewu Lu, Stanford University; Qixing Huang, Stanford University; Dimitrios Gunopulos, ; Leonidas Guibas, Stanford University; Qiang Yang, HKUST
A Text Clustering Algorithm Using an Online Clustering Scheme for Initialization
Author(s): Jianhua Yin*, Tsinghua University; Jianyong Wang,
Batch model for batched timestamps data analysis with application to the SSA disability program
Author(s): Qingqi Yue*, NIH; Ao Yuan, NIH; Xuan Che, NIH; Elizabeth Rasch, NIH; Minh Huynh, Impaq; Chunxiao Zhou, NIH
Efficient Frequent Directions Algorithm for Sparse Matrices
Author(s): Mina Ghashami*, University of utah; Edo Liberty, Yahoo ; Jeff Phillips, School of Computing, University of Utah
AnyDBC: An Efficient Anytime Density-based Clustering Algorithm for Very Large Complex Datasets
Author(s): Son Mai*, Aarhus University; Ira Assent, ; Martin Storgaard, Aarhus University
Data-driven Automatic Treatment Regimen Development and Recommendation
Author(s): Leilei Sun*, Dalian University of Technolog; Chuanren Liu, Drexel University; Chonghui Guo, ; Hui Xiong, Rutgers; Yanming Xie,
Structured Doubly Stochastic Matrix for Graph Based Clustering
Author(s): Xiaoqian Wang, Univ. of Texas at Arlington; Feiping Nie, University of Texas at Arlington; Heng Huang*, Univ. of Texas at Arlington
Infinite Ensemble for Image Clustering
Author(s): Hongfu Liu*, Northeastern University; Ming Shao, Northeastern University; Sheng Li, Northeastern University; Yun Fu, Northeastern University