Are your data gathered?
Alban Siffer (Univ. Rennes, Inria, CNRS, IRISA, Amossys); Pierre-Alain Fouque (Univ. Rennes, CNRS, IRISA, IUF); Alexandre Termier (Univ. Rennes, Inria, CNRS, IRISA); Christine Largou
Understanding data distributions is one of the most fundamental research topic in data analysis. The literature provides a great deal of powerful statistical learning algorithms to gain knowledge on the underlying distribution given multivariate observations. We are likely to find out a dependence between features, the appearance of clusters or the presence of outliers. Before such deep investigations, we propose the folding test of unimodality. As a simple statistical description, it allows to detect whether data are gathered or not (unimodal or multimodal). To the best of our knowledge, this is the first multivariate and purely statistical unimodality test. It makes no distribution assumption and relies only on a straightforward p-value. Through real world data experiments, we show its relevance and how it could be useful for clustering.