Discovering Succinct Pattern Sets Expressing Co-Occurrence and Mutual Exclusivity
Jonas Fischer: Max Planck Institute for Informatics; Jilles Vreeken: CISPA Helmholtz Center for Information Security
Pattern mining is one of the core topics of data mining. We consider the problem of mining a succinct set of patterns that together explain the data in terms of mutual exclusivity and co-occurence. That is, we extend the traditional pattern languages beyond conjunctions, enabling us to capture more complex relationships, such as replacable sub-components or antagonists in biological pathways.
We formally define this problem in terms of the Minimum Description Length principle, by which we identify the best set of patterns as the one that most succinctly describes the data. To avoid spurious results—-in sparse data mutual exclusivity is likely just due to chance—-we propose an efficient statistical test for K-ary mutual exclusivity. As the search space for the optimal model is enormous and unstructured, we propose Mexican, a heuristic algorithm to efficiently discover high quality sets of patterns of co-occurences and mutual exclusivity. Through extensive experiments we show that Mexican recovers the ground truth on synthetic data, and meaningful results on real-world data. Both in stark contrast to the state of the art, that result in millions of spurious patterns.
How can we assist you?
We'll be updating the website as information becomes available. If you have a question that requires immediate attention, please feel free to contact us. Thank you!
Please enter the word you see in the image below: