Subjectively Interesting Component Analysis: Data Projections that Contrast with Prior Expectations
Bo Kang*, Ghent University; Jefrey Lijffijt, Ghent University; Raul Santos-Rodriguez, University of Bristol; Tijl De Bie, Ghen University
Methods that ﬁnd insightful low-dimensional projections are essential to eﬀectively explore high-dimensional data. Principal Component Analysis is used pervasively to ﬁnd low-dimensional projections, not only because it is straightforward to use, but it is also often eﬀective, because the variance in data is often dominated by relevant structure. However, even if the projections highlight real structure in the data, not all structure is interesting to every user. If a user is already aware of, or not interested in the dominant structure, Principal Component Analysis is less eﬀective for ﬁnding interesting components. We introduce a new method called Subjectively Interesting Component Analysis (SICA), designed to ﬁnd data projections that are subjectively interesting, i.e, projections that truly surprise the end-user. It is rooted in information theory and employs an explicit model of a user’s prior expectations about the data. The corresponding optimization problem is a simple eigenvalue problem, and the result is a trade-oﬀ between explained variance and novelty. We present ﬁve case studies on synthetic data, images, time-series, and spatial data, to illustrate how SICA enables users to ﬁnd (subjectively) interesting projections.