HeavyGuardian: Separate and Guard Hot Items in Data Streams
Tong Yang (Peking University); Junzhi Gong (Peking University); Haowei Zhang (Peking University); Lei Zou (Peking University); Lei Shi (SKLCS, Institute of Software, Chinese Academy of Sciences); Xiaoming Li (Peking University)
Data stream processing is a fundamental issue in many fields, such as data mining, databases, network traffic measurement. There are five typical tasks in data stream processing: frequency estimation, heavy hitter detection, heavy change detection, frequency distribution estimation, and entropy estimation. Different algorithms are proposed for different tasks, but they seldom achieve high accuracy and high speed at the same time. To address this issue, we propose a novel data structure named HeavyGuardian. The key idea is to intelligently separate and guard the information of hot items while approximately record the frequencies of cold items. We deploy HeavyGuardian on the above five typical tasks. Extensive experimental results show that HeavyGuardian achieves both much higher accuracy and higher speed than the state-of-the-art solutions for each of the five typical tasks. The source codes of HeavyGuardian and other related algorithms are available at GitHub.