A PHP Error was encountered

Severity: 8192

Message: Non-static method URL_tube::usage() should not be called statically, assuming $this from incompatible context

Filename: url_tube/pi.url_tube.php

Line Number: 13

Fast Unsupervised Online Drift Detection Using Incremental Kolmogorov-Smirnov Test

KDD Topics

Abstract

Data stream research has grown rapidly over the last decade. Two major features distinguish data stream from batch learning: stream data are generated on the ﬂy, possibly in a fast and variable rate; and the underlying data distribution can be non-stationary, leading to a phenomenon known as concept drift. Therefore, most of the research on data stream classiﬁcation focuses on proposing eﬃcient models that can adapt to concept drifts and maintain a stable performance over time. However, speciﬁcally for the classiﬁcation task, the majority of such methods rely on the instantaneous avail-ability of true labels for all already classiﬁed instances. This is a strong assumption that is rarely fulﬁlled in practical applications. Hence there is a clear need for eﬃcient methods that can detect concept drifts in an unsupervised way. One possibility is the well-known Kolmogorov-Smirnov test, a statistical hypothesis test that checks whether two samples diﬀer. This work has two main contributions. The ﬁrst one is the Incremental Kolmogorov-Smirnov algorithm that allows performing the Kolmogorov-Smirnov hypothesis test instantly using two samples that change over time, where the change is an insertion and/or removal of an observation. Our algorithm employs a randomized tree and is able to per-form the insertion and removal operations in O(log N) with high probability and calculate the Kolmogorov-Smirnov test in O(1), where N is the number of sample observations. This is a signiﬁcant speed-up compared to the O(N log N) cost of the non-incremental implementation. The second contribution is the use of the Incremental Kolmogorov-Smirnov test to detect concept drifts without true labels. Classiﬁcation algorithms adapted to use the test rely on a limited portion of those labels just to update the classiﬁcation model after a concept drift is detected.

Filed under: Big Data | Semi-Supervised Learning

A PHP Error was encountered

KDD Topics

Fast Unsupervised Online Drift Detection Using Incremental Kolmogorov-Smirnov Test

Abstract

Comments