As data becomes dynamic, large, and distributed, there is in-creasing demand for what have become known as distributed stream algorithms. Since continuously collecting the data to a central server and processing it there incurs very high communication and computation complexities, it is advantageous to define local conditions at the nodes, such that – as long as they are maintained – some desirable global condition holds.

A generic algorithm which proved very useful for reducing communication in distributed streaming environments is geometric monitoring (GM). Alas, applying GM to many important tasks is computationally very demanding, as it requires solving a notoriously difficult problem – computing the distance between a point and a surface, which is often very time-consuming even in low dimensions. Thus, while useful for reducing communication, GM often suffers from exceedingly heavy computational burden at the nodes, which renders it very problematic to apply, especially for “thin”, battery-operated sensors, which are prevalent in numerous applications, including the “Internet of Things” paradigm.

Here we propose a very different approach, designated CB (for Convex/Concave Bounds). CB is based on directly bounding the monitored function by suitably chosen convex and concave functions, that naturally enable monitoring distributed streams. These functions can be checked on the fly, yielding far simpler local conditions than those applied by GM. CB’s superiority over GM is demonstrated in reducing computational complexity, by several orders of magnitude in some cases. As an added bonus, CB also reduced communication overhead in all application scenarios we tested.

Filed under: Big Data | Time Series and Stream Mining