KDD95 Description of Demos


Demo SessionChair: Tej Anand, AT&T Global Information Solutions
6:00-8:00 PM, Sunday, August 20th
Room 407A, Palais des Congrès

Demo Presentation List:

Ronen Feldman
Hing-Yan Lee, Hwee-Leng Ong and Lee-Hian Quek
Christopher Matheus and Gregory Piatetsky-Shapiro
Arun Sanjeev and Jan Zytkow
Dr. Jiawei Han and Yongjian Fu
Richard Scheines, Peter Spirtes, Clark Glymour, and Christopher Meek


KDD95 HOME Page


(Return to List)

Knowledge Discovery in Textual Databases

Ronen Feldman,
Bar-Ilan University
Most KDD systems only handle structured databases. However, much online information is in the form of unstructured text. KDT is a system for the browsing and analysis of collections of unstructured texts. Each document in the collection is annotated by a set of keywords organized in a hierarchical structure. KDT enables the user to browse the textual database by selecting keywords from the hierarchy and viewing their distributions against other classes of keywords. KDT also enables the user to compare distributions of similar keywords and view the results using tables and graphs. Finally, as in traditional KDD systems, KDT searches for irregular distributions, correlations, and associations based on conditions and thresholds supplied by the user. KDT includes a browsing facility in which the user can click on any discovered pattern and get the list of all documents that contributed to this pattern. KDT is implemented on MS-Windows, and was designed with a special emphasis on efficiency and ease of use.


(Return to List)

WinViz and Machine Learning: An integrated Approach to Data Mining

Hing-Yan Lee, Hwee-Leng Ong and Lee-Hian Quek,
Information Technology Institute
WinViz and Machine Learning: An integrated Approach to Data Mining Knowledge Discovery in Databases encompasses many technologies such as visualization and machine learning. In ITI, we have developed WinViz that uses a multidimensional visualization (MDV) technique to discover patterns and trends in multi-dimensional data. However, we find that a synergistic combination of WinViz with machine learning provides an even greater leverage for KDD. To this end, we have developed a prototype that seamlessly integrates these two technologies. The use of WinViz for KDD has its motivation from the adage that a picture is worth a thousand words. WinViz is a visual data analysis tool. It presents a global view of the data in a single picture. It also has an interactive visual query interface that allows one to formulate hypothesis and drill down through the data to discover hidden patterns and trends. Using WinViz, we can quickly discover relationships between different attributes in a dataset. We have integrated WinViz with the popular machine learning algorithm C4.5. The if-then rules generated by C4.5 can be visualized on WinViz to spot potential exceptions to the rules. The integration harnesses the interactivity and visual representation of WinViz with the generalization capability of C4.5.


(Return to List)

KEFIR: The Key Findings Reporter for the Analysis of Healthcare Information

Christopher Matheus and Gregory Piatetsky-Shapiro,
GTE Labs.
Key Findings Reporter, (KEFIR) a system for discovering and explaining ``key findings'' in large, changing databases, is currently being applied to the analysis of GTE healthcare data. The system performs an automatic drill-down through data along multiple dimensions to determine the most interesting deviations of specific quantitative measures relative to their previous and expected values. It explains ``key'' deviations through their relationship to other deviations in the data, and, where appropriate, generates recommendations for actions in response to these deviations. KEFIR uses Netscape, a WWW browser, to present its findings in a hypertext report, with natural language and business graphics.

Status: Application in beta-testing.


(Return to List)

Automated Large-scale Data Mining by Forty-Niner (49er)

Arun Sanjeev and Jan Zytkow
Universities all over the world vary widely in their student population, environmental setting, academic programs offered, etc. Yet, higher educational problems like enrollment, attrition, retention, and many others faced by all the universities are strikingly similar. Large databases consisting of hundreds of thousands of student records exist in universities. These student databases are useful source of knowledge for resolving problems faced by universities. But the knowledge is implicit in the data and must be mined and expressed in an useful form. We demonstrate an application of Forty-Niner (49er) on our university student database.

49er is an automated discovery system which explores databases in search for knowledge. 49er discovers knowledge in the form of regularities, that is statements of the form ``Pattern P holds for data in range R''. We show how 49er systematically searches a large number of data subsets discovering even patterns that occur in limited circumstances. 49er evaluates and reports only those patterns that pass the user thresholds. As an example, we demonstrate a focused search (evaluating remedial programs) where thresholds are controlled to select the most weakest pattern in the data. The regularities discovered through incremental exploration are useful for managing enrollment at our university.

Status: Fielded application


(Return to List)

Mining various kinds of knowledge by DBMiner (previously DBLearn)

Dr. Jiawei Han and Mr. Yongjian Fu
Database System Research Lab., Computing Science,
Simon Fraser University
The major features of DBMiner (an early version named DBLearn) include:
1. integration of machine learning and database technologies,
2. discovery of different kinds of knowledge from large databases, including characteristic, discriminant, association, and classification rules,
3. high speed and efficiency in analyzing large databases,
4. interactive knowledge mining, and
5. smooth integration with commercial relational database systems.

The system will be demonstrated using a large database, an SQL-like data mining language, and an interactive graphical user interface.

Status: A research prototype, seeking for commercialization and applications


(Return to List)

TETRAD II: Tools for Discovery

Richard Scheines, Peter Spirtes, Clark Glymour, and Christopher Meek,
Carnegie Mellon University
TETRAD II is a multi-module program that assists in the construction of Bayes networks or causal models for sample data and in the use of Bayes networks in prediction. With continuous variables the program will aid in the search for "path models" or "structural equation models;" with discrete data the program
will construct and update a Bayes network from sample data and user knowledge of the domain; the program includes Monte Carlo facilities. Proofs of the
asymptotic correctness of all but one of the search modules are available in P. Spirtes, C. Glymour and R. Scheines, Causation, Prediction and Search, Springer Lecture Notes in Statistics, 1993.

Platform(s): DOS

A Unix version may be available soon.

The DOS software comes with a 250 page manual with chapters on theoretical foundations, interpreting output, and a chapter on each of the software modules. Each of the chapters include many detailed examples.

Status: Commercially available