KDD 2004: Conference Panels

Program > Panels

Data Mining: Good, Bad, or Just a Tool?

Chair: Raghu Ramakrishnan, University of Wisconsin-Madison, USA

This panel is intended to be a forum to argue for continued efforts in developing data mining as a technology, while giving privacy advocates the opportunity to articulate their concerns. Three main issues will be discussed: (1) There is significant value to society in developing the science underpinning data mining, but also significant risk for misuse of the technology. The same techniques that could accurately identify malignant tumors could be used to classify individuals as potential terrorists, and the medical information that can be used to help doctors in emergency situations can also be used for invasive marketing. What should our response be? To disallow data mining altogether? To only apply it to "non-controversial" areas? To accept some risk if the need is acute or the benefits are compelling? (2) If our response is to develop data mining techniques and to apply them with care when appropriate or necessary, what checks and balances are required in order to safeguard individual rights? How can we constrain when and to what ends the technology is applied, and how the results are interpreted? What are the parallels to existing legal protections? What are the differences that make the problem of electronic privacy more challenging? (3) The Technology and Privacy Advisory Committee (TAPAC) recently issued its report. What are its main recommendations? How will, or should, it influence data mining research and practice?

Panelists

Deirdre Mulligan, University of California Berkeley, USA
David Jensen, University of Massachusetts Amherst, USA
Michael J. Pazzani, National Science Foundation, USA
Rakesh Agrawal, IBM Almaden Research Center, USA

Can Natural Language Processing Help Text Mining?

Chair: Panel Chair: Anne Kao, Boeing Phantom Works, USA

Natural Language Processing (NLP) has been around for a number of decades. It has developed various techniques that are typically linguistically inspired, i.e. text is typically syntactically parsed using information from a formal grammar and a lexicon, the resulting information is then interpreted semantically and used to extract information about what was said. NLP may be deep or shallow, and even use statistical means to disambiguate word senses or multiple parses of the same sentence. It tends to focus on one document or piece of text at a time and be rather computationally expensive. It includes techniques like word stemming, multiword phrase grouping, synonym normalization, anaphora resolution, and role determination.

Text Mining is more recent, and uses techniques primarily developed in statistics and machine learning. Its aim typically is not to understand all or even a large part of what a given speaker/writer has said, but rather to extract patterns across a large number of documents. It includes things like text classification according to some fixed set of categories, automatic text clustering, extraction of topics from texts or groups of text and the analysis of trends.

In this panel, we will discuss (1) Can traditional NLP methods help text mining? If so, can they help all areas of text mining? Or just some areas? Which NLP areas/techniques are useful? (2) What is novel about text mining vs. NLP? In light of this, what would be some new future directions for NLP in light of requirements from text mining?

Panelists

Jaime Carbonell, Carnegie Mellon University, USA
Ken Church, Microsoft Research, USA
Oren Etzioni, University of Washington, USA
Nancy Lawler, Department of Defense, USA
Marko Grobelnik, Jozef Stefan Institute, Slovenia
Dave Lewis, David Lewis Consulting, USA
Giovanni Marchisio, Insightful Corporation, USA

Program > Panels

Data Mining: Good, Bad, or Just a Tool?

Panelists

Can Natural Language Processing Help Text Mining?

Panelists

Subtopics: