A PHP Error was encountered

Severity: 8192

Message: Non-static method URL_tube::usage() should not be called statically, assuming $this from incompatible context

Filename: url_tube/pi.url_tube.php

Line Number: 13

A Text Clustering Algorithm Using an Online Clustering Scheme for Initialization

KDD Topics

Abstract

In this paper, we propose a text clustering algorithm using an online clustering scheme for initialization called FGSD-MM+. FGSDMM+ assumes that there are at most Kmax clusters in the corpus, and regards these Kmax potential clusters as one large potential cluster at the beginning. During initialization, FGSDMM+ processes the documents one by one in an online clustering scheme. The ﬁrst document will choose the potential cluster, and FGSDMM+ will create a new cluster to store this document. Later documents will choose one of the non-empty clusters or the potential cluster with probabilities derived from the Dirichlet multinomial mixture model. Each time a document chooses the potential cluster, FGSDMM+ will create a new cluster to store that document and decrease the probability of later documents choosing the potential cluster. After initialization, FGSDMM+ will run a collapsed Gibbs sampling algorithm several times to obtain the ﬁnal clustering result. Our extensive experimental study shows that FGSDMM+ can achieve better performance than three other clustering methods on both short and long text datasets.

Filed under: Clustering | Dimensionality Reduction

A PHP Error was encountered

KDD Topics

A Text Clustering Algorithm Using an Online Clustering Scheme for Initialization

Abstract

Comments