BigIN4: Instant, Interactive Insight Identification for Multi-Dimensional Big Data
Qingwei Lin (Microsoft); Weichen Ke (Peking University); Jian-Guang Lou (Microsoft); Hongyu Zhang (The University of Newcastle); Kaixin Sui (Microsoft); Yong Xu (Microsoft); Ziyi Zhou (Microsoft); Bo Qiao (Microsoft); Dongmei Zhang (Microsoft)
The ability to identify insights from multi-dimensional big data is important for business intelligence. To enable interactive identification of insights, a large number of dimension combinations need to be searched and a series of aggregation queries need to be quickly answered. The existing approaches answer interactive queries on big data through data cubes or approximate query processing. However, these approaches can hardly satisfy the performance or accuracy requirements for ad-hoc queries demanded by interactive exploration. In this paper, we present BigIN4, a system for instant, interactive identification of insights from multi-dimensional big data. BigIN4 gives insight suggestions by enumerating subspaces and answers queries by combining data cube and approximate query processing techniques. If a query cannot be answered by the cubes, BigIN4 decomposes it into several low dimensional queries that can be directly answered by the cubes through an online constructed Bayesian Network and gives an approximate answer within a statistical interval. Unlike the related works, BigIN4 does not require any prior knowledge of queries and does not assume a certain data distribution. Our experiments on ten real-world large-scale datasets show that BigIN4 can successfully identify insights from big data. Furthermore, BigIN4 can provide approximate answers to aggregation queries effectively (with less than 10% error on average) and efficiently (50x faster than sampling-based methods).