Statistical Emerging Pattern Mining with Multiple Testing Correction
Junpei Komiyama (The University of Tokyo);Masakazu Ishihata (Hokkaido University);Hiroki Arimura (Hokkaido University);Takashi Nishibayashi (VOYAGE GROUP, Inc.);Shin-Ichi Minato (Hokkaido University)
Abstract
Emerging patterns are patterns whose support significantly differs between two databases. We study the problem of listing emerging patterns with a multiple testing guarantee. Recently, Terada et al. proposed the Limitless Arity Multiple-testing Procedure (LAMP) that controls the family-wise error rate (FWER) in statistical association mining. LAMP reduces the number of ``untestable’’ hypotheses without compromising its statistical power. Still, FWER is restrictive, and as a result, its statistical power is inherently unsatisfying when the number of patterns is large.
On the other hand, the false discovery rate (FDR) is less restrictive than FWER, and thus controlling FDR yields a larger number of significant patterns. We propose two emerging pattern mining methods: the first one controls FWER, and the second one controls FDR. The effectiveness of the methods is verified in computer simulations with real-world datasets.