xStream: Outlier Detection in Feature-Evolving Data Streams
Emaad Ahmed Manzoor (Carnegie Mellon University); Hemank Lamba (Carnegie Mellon University); Leman Akoglu (Carnegie Mellon University)
This work addresses the outlier detection problem for feature-evolving streams, which has not been studied before. In this setting both (1) data points may evolve, with feature values changing, as well as (2) feature space may evolve, with newly-emerging features over time. This is notably different from row-streams, where points with fixed features arrive one at a time. We propose a density-based ensemble outlier detector, called xStream, for this more extreme streaming setting which has the following key properties: (1) it is a constant-space and constant-time (per incoming update) algorithm, (2) it measures outlierness at multiple scales or granularities, it can handle (3 i ) high-dimensionality through distance-preserving projections, and (3$ii$) non-stationarity via $O(1)$-time model updates as the stream progresses. In addition, xStream can address the outlier detection problem for the (less general) disk-resident static as well as row-streaming settings. We evaluate xStream rigorously on numerous real-life datasets in all three settings: static, row-stream, and feature-evolving stream. Experiments under static and row-streaming scenarios show that xStream is as competitive as state-of-the-art detectors and particularly effective in high-dimensions with noise. We also demonstrate that our solution is fast and accurate with modest space overhead for evolving streams, on which there exists no competition.