Streaming-LDA: A Copula-based Approach to Modeling Topic Dependencies in Document Streams
Hesam Amoualian*, University Grenoble Alps; Marianne Clausel, University of Grenoble Alps; Eric Gaussier, University of Grenoble Alps; Massih-Reza Amini, University of Grenoble Alps
We propose in this paper two new models for modeling topic and word-topic dependencies between consecutive documents in document streams. The ﬁrst model is a direct extension of Latent Dirichlet Allocation model (LDA) and makes use of a Dirichlet distribution to balance the inﬂuence of the LDA prior parameters wrt to topic and word-topic distribution of the previous document. The second extension makes use of copulas, which constitute a generic tools to model dependencies between random variables. We rely here on Archimedean copulas, and more precisely on Franck copulas, as they are symmetric and associative and are thus appropriate for exchangeable random variables. Our experiments, conducted on three standard collections that have been used in several studies on topic modeling, show that our proposals outperform previous ones (as dynamic topic models and temporal LDA), both in terms of perplexity and for tracking similar topics in a document stream.
Filed under: Dimensionality Reduction