Towards an Optimal Subspace for K-Means

Dominik Mautz (Ludwig-Maximilians-Universität München);Wei Ye (Ludwig-Maximilians-Universität München);Claudia Plant (Universität Wien);Christian Böhm (Ludwig-Maximilians-Universität München)

Abstract

Is there an optimal dimensionality reduction for k-means, revealing the prominent cluster structure hidden in the data? We propose SubKmeans, which extends the classic k-means algorithm. The goal of this algorithm is twofold: find a sufficient k-means-style clustering partition and transform the clusters onto a common subspace, which is optimal for the cluster structure. Our solution is able to pursue these two goals simultaneously. The dimensionality of this subspace is found automatically and therefore the algorithm comes without the burden of additional parameters. At the same time this subspace helps to mitigate the curse of dimensionality. The SubKmeans optimization algorithm is intriguingly simple and efficient. It is easy to implement and can readily be adopted to the current situation. Furthermore, it is compatible to many existing extensions and improvements of k-means.