Cluster analysis or clustering aims to take a collection of objects and divide them into a number of different groups such that instances in the same group (cluster) are similar to each other and dis-similar to those in other groups/clusters. It is extensively used in many domains including image analysis, information retrieval and bioinformatics. Clustering is traditionally inherently exploratory in that it takes no human guidance and aims to uncover the underlying structure in the data. Recent innovations include adding supervision (semi-supervised clustering), constraints (constrained clustering) and extensions to handle complex data such as graphs, evolving data and multi-view data.

The survey of classic methods is given in [1] with a perspective on challenge and directions given in [2]. A talk based on [2] is freely available: http://videolectures.net/ecmlpkdd08_jain_dcyb/?q=anil%20jain

Lesson 2 of the this MOOC covers many traditional clustering methods https://www.class-central.com/mooc/1848/udacity-machine-learning-unsupervised-learning.

[1] Jain, Anil K., M. Narasimha Murty, and Patrick J. Flynn. “Data clustering: a review.” ACM computing surveys (CSUR) 31.3 (1999): 264-323.

[2] Jain, Anil K. “Data clustering: 50 years beyond K-means.” Pattern recognition letters 31.8 (2010): 651-666.

