KDD Cup 2003: Network mining and usage log analysis

I. Citation Prediction Task

First Place: J N Manjunatha, Raghavendra Pandey, Sivaramakrishnan R., and M Narasimha Murty (1329)
First Runner Up: Claudia Perlich, Foster Provost, and Sofus Macskassy (1360)
Second Runner Up: David Vogel (1398)

The number in parentheses after each winner is the L_1 difference between the solution and the submission.

The solution for Task 1 is now available. The first column is the hep-th arxiv-id and the second column is (# of citations from May-July) - (# of citations from Feb-April) for all papers that received at least 6 citations between Feb and April.

In addition, the full list of new citations for all papers between May and July is also available.

II. Data Cleaning Task

First Place: David Vogel (421,582)
First Runner Up: Sunita Sarawagi, Kapil M. Bhudhia, Sumana Srinivasan, and V.G.Vinod Vydiswaran (516,242)
Second Runner Up: Martine Cadot and Joseph di Martino (538,013)

The number in parentheses after each winner is the size of the symmetric difference between the submission and the solution.

The solution for Task 2 is a citation graph provided by SLAC/SPIRES for hep-ph papers available as a zip file. Papers in the left column cite papers in the right column.

III. Download Estimation Task

First Place: Janez Brank and Jure Leskovec (21,232) [slides]
First Runner Up: Joseph Milana, Joseph Sirosh, Joel Carleton, Gabriela Surpi, Daragh Hartnett, and Michinari Momma (21,950.6)
Second Runner Up: Kohsuke Konishi (23,759)

The number in parentheses after each winner is the L_1 difference between the contestant's submission and the solution.

The actual download counts for the top 150 papers (50 from each of the three missing periods) are available here. The left column is the number of downloads the paper received in its first 60 days and the right column is the hep-th arxiv-id.

IV. Open Task

First Place: Amy McGovern, Lisa Friedland, Michael Hay, Brian Gallagher, Andrew Fast, Jennifer Neville, and David Jensen. "Exploiting Relational Structure to Understand Publication Patterns in High-Energy Physics" [slides]
First Runner Up: Shou-de Lin and Hans Chalupsky. "Using Unsupervised Link Discovery Methods to Find Interesting Facts and Connections in a Bibliography Dataset"
Second Runner Up: Shawndra Hill and Foster Provost "The Myth of the Double-Blind Review"

The submissions for Task 4 were evaluated by a small program committee consisting of the three KDD Cup 2003 co-chairs, Mark Craven (University of Wisconsin-Madison), David Page (University of Wisconsin-Madison), and Soumen Chakrabarti (Indian Institute of Technology Bombay).

