MetaPAD: Meta Patten Discovery from Massive Text Corpora

Meng Jiang (University of Illinois at Urbana-Champaign);Jingbo Shang (University of Illinois at Urbana-Champaign);Taylor Cassidy (Army Research Lab);Xiang Ren (University of Illinois at Urbana-Champaign);Lance Kaplan (Army Research Lab);Timothy Hanratty (Army Research Lab);Jiawei Han (University of Illinois at Urbana-Champaign)

Abstract

Mining textual pattens in news, tweets, papers, and many other kinds of text corpora has been an active theme in text mining and NLP research. Previous studies adopt a dependency parsing-based patten discovery approach. However, the parsing results lose rich around entities in the patten, and the process is costly for a corpus of large scale. In this study, we propose a novel typed textual patten structure, called meta patten, which is extended to a frequent, informative, and precise subsequence patten in certain context. We propose an efficient framework, called MetaPAD, which discovers meta patten from massive corpora with three techniques: (1) it develops a context segmentation method to carefully determine the boundaries of patten with a learnt patten quality assessment function, which avoids dependency parsing and high-quality patten; (2) it identifies and groups synonymous meta patten from multiple facets—-their types, contexts, and extractions; and (3) it examines type distributions of entities in the instances extracted by each group of patten, and looks for appropriate type levels to make discovered precise. Experiments demonstrate that our proposed framework discovers high-quality typed textual patten efficiently from different genres of massive corpora and facilitates information extraction.

KDD Papers

MetaPAD: Meta Patten Discovery from Massive Text Corpora

Abstract

Comments

Diamond Sponsor

Platinum

Gold

Silver

Bronze

KDD Cup

Industry/Government Track Best Paper Awards

Research Track Best Paper Awards

Dissertation Award

Best Student Paper

Media Sponsor

WiFi Sponsor

Named Student Travel Grant

Lanyard Sponsor

Track/Session Sponsors

Contact Us

Save the Date