WavingSketch: An Unbiased and Generic Sketch for Finding Top-k Items in Data Streams
Jizhou Li: Peking University Shenzhen Graduate School; Zikun Li: Peking University; Yifei Xu: Peking University; Shiqi Jiang: Peking University; Tong Yang: Peking University; Bin Cui: Peking University; Yafei Dai: Peking University; Gong Zhang: Huawei Technologies
Finding top-k items in data streams is a fundamental problem in data mining. Existing algorithms that can achieve unbiased estimation suffer from poor accuracy. In this paper, we propose a new sketch, WavingSketch, which is much more accurate than existing unbiased algorithms. WavingSketch is generic, and we show how it can be applied to four applications: finding top-k frequent items, finding top-k heavy changes, finding top-k persistent items, and finding top-k Super-Spreaders. We theoretically prove that WavingSketch can provide unbiased estimation, and then give an error bound of our algorithm. Our experimental results show that, compared with the state-of-the-art, WavingSketch has 4.50 times higher insertion speed and up to 9 x 106 times (2 x 104 times in average) lower error rate in finding frequent items when memory size is tight. For other applications, WavingSketch can also achieve up to 286 times lower error rate. All related codes are open-sourced and available at Github anonymously.
How can we assist you?
We'll be updating the website as information becomes available. If you have a question that requires immediate attention, please feel free to contact us. Thank you!
Please enter the word you see in the image below: