This paper describes Sparse Frequent Directions, a variant of Frequent Directions for sketching sparse matrices. It resembles the original algorithm in many ways: both receive the rows of an input matrix An⇥d one by one in the streaming setting and compute a small sketch B 2 R`⇥d. Both share the same strong (provably optimal) asymptotic guarantees with respect to the space-accuracy tradeoff in the streaming setting. However, unlike Frequent Directions which runs in O(nd`) time regardless of the sparsity f the input ma rix A, Sparse Frequent Directions runs in O˜ nnz(A)` + n`2 time. Our analysis loosens the dependence on computing the Sin-gular Value Decomposition (SVD) as a black box within the Frequent Directions algorithm. Our bounds require recent results on the properties of fast approximate SVD computations. Finally, we empirically demonstrate that these asymptotic improvements are practical and significant on real and synthetic data.

Filed under: Clustering | Frequent Pattern Mining | Big Data | Time Series and Stream Mining