Curated by: Ian Davidson
Cluster analysis or clustering aims to take a collection of objects and divide them into a number of different groups such that instances in the same group (cluster) are similar to each other and dis-similar to those in other groups/clusters. It is extensively used in many domains including image analysis, information retrieval and bioinformatics. Clustering is traditionally inherently exploratory in that it takes no human guidance and aims to uncover the underlying structure in the data. Recent innovations include adding supervision (semi-supervised clustering), constraints (constrained clustering) and extensions to handle complex data such as graphs, evolving data and multi-view data.
The survey of classic methods is given in  with a perspective on challenge and directions given in . A talk based on  is freely available: http://videolectures.net/ecmlpkdd08_jain_dcyb/?q=anil%20jain
Lesson 2 of the this MOOC covers many traditional clustering methods https://www.class-central.com/mooc/1848/udacity-machine-learning-unsupervised-learning.
 Jain, Anil K., M. Narasimha Murty, and Patrick J. Flynn. “Data clustering: a review.” ACM computing surveys (CSUR) 31.3 (1999): 264-323.
 Jain, Anil K. “Data clustering: 50 years beyond K-means.” Pattern recognition letters 31.8 (2010): 651-666.
Related KDD2016 Papers
|Title & Authors|
|Efficient Frequent Directions Algorithm for Sparse Matrices|
Author(s): Mina Ghashami*, University of utah; Edo Liberty, Yahoo ; Jeff Phillips, School of Computing, University of Utah
|AnyDBC: An Efficient Anytime Density-based Clustering Algorithm for Very Large Complex Datasets|
Author(s): Son Mai*, Aarhus University; Ira Assent, ; Martin Storgaard, Aarhus University
|City-Scale Map Creation and Updating using GPS Collections|
Author(s): Chen Chen*, Stanford University; Cewu Lu, Stanford University; Qixing Huang, Stanford University; Dimitrios Gunopulos, ; Leonidas Guibas, Stanford University; Qiang Yang, HKUST
|A Text Clustering Algorithm Using an Online Clustering Scheme for Initialization|
Author(s): Jianhua Yin*, Tsinghua University; Jianyong Wang,
|Batch model for batched timestamps data analysis with application to the SSA disability program|
Author(s): Qingqi Yue*, NIH; Ao Yuan, NIH; Xuan Che, NIH; Elizabeth Rasch, NIH; Minh Huynh, Impaq; Chunxiao Zhou, NIH
|Structured Doubly Stochastic Matrix for Graph Based Clustering|
Author(s): Xiaoqian Wang, Univ. of Texas at Arlington; Feiping Nie, University of Texas at Arlington; Heng Huang*, Univ. of Texas at Arlington
|Infinite Ensemble for Image Clustering|
Author(s): Hongfu Liu*, Northeastern University; Ming Shao, Northeastern University; Sheng Li, Northeastern University; Yun Fu, Northeastern University
|Data-driven Automatic Treatment Regimen Development and Recommendation|
Author(s): Leilei Sun*, Dalian University of Technolog; Chuanren Liu, Drexel University; Chonghui Guo, ; Hui Xiong, Rutgers; Yanming Xie,