KDD Cup 2006: Pulmonary embolisms detection from image data
Challenge of Pulmonary Emboli Detection
Pulmonary embolism (PE) is a condition that occurs when an artery in the lung becomes blocked. In most cases, the blockage is caused by one or more blood clots that travel to the lungs from another part of your body. While PE is not always fatal, it is nevertheless the third most common cause of death in the US, with at least 650,000 cases occurring annually.1 The clinical challenge, particularly in an Emergency Room scenario, is to correctly diagnose patients that have a PE, and then send them on to therapy. This, however, is not easy, as the primary symptom of PE is dysapnea (shortness of breath), which has a variety of causes, some of which are relatively benign, making it hard to separate out the critically ill patients suffering from PE.
The two crucial clinical challenges for a physician, therefore, are to diagnose whether a patient is suffering from PE and to identify the location of the PE. Computed Tomography Angiography (CTA) has emerged as an accurate diagnostic tool for PE. However, each CTA study consists of hundreds of images, each representing one slice of the lung. Manual reading of these slices is laborious, time consuming and complicated by various PE look-alikes (false positives) including respiratory motion artifacts, flowrelated artifacts, streak artifacts, partial volume artifacts, stair step artifacts, lymph nodes, and vascular bifurcation, among many others. Additionally, when PE is diagnosed, medications are given to prevent further clots, but these medications can sometimes lead to subsequent hemorrhage and bleeding since the patient must stay on them for a number of weeks after the diagnosis. Thus, the physician must review each CAD output carefully for correctness in order to prevent overdiagnosis. Because of this, the CAD system must provide only a small number of false positives per patient scan.
The goal of a CAD system, therefore, is to automatically identify PE's. In an almost universal paradigm for CAD algorithms, this problem is addressed by a 3 stage system:
- Identification of candidate regions of interest (ROI) from a medical image,
- Computation of descriptive features for each candidate, and
- Classification of each candidate (in this case, whether it is a PE or not) based on its features.
In this year's KDD Cup data, Steps 1 and 2 have been done for you. Your goal is to design a series of classifiers related to Step 3.
Task 1: The first classification task is to label individual PE's. For clinical acceptability, it is critical to control false positive rates - a system that "cries wolf" too often will be rejected out of hand by clinicians. Thus, the goal is to detect as many true PE's as possible, subject to a constraint on false positives.
For this task, we make the following definitions:
- PE sensitivity is defined as the number of PE's correctly identified in a patient. A PE is correctly identified if at least one of the candidates associated with that PE is correctly labeled as a positive. Note: identifying 2 or more candidates for the same PE makes no impact on the sensitivity.
- False positives are defined as the number of candidates falsely labeled as a PE in a patient - i.e., the total of all negative candidates labeled as PEs in the patient.
- The average FP rate for a test set is the average number of FPs produced across all patients in that test set.
You may (probably should) use different classifiers for each sub-task below:
- Task 1a. Build a system where the false positive rate is at most 2 per patient.
- Task 1b. Build a system where the false positive rate is at most 4 per patient.
- Task 1c. Build a system where the false positive rate is at most 10 per patient.
In each task, the classifiers will be ranked based on PE sensitivity, as long as the false
positive rate meets the specified threshold.
Task 2: The second classification task is to label each patient as having a PE or not. The reason this is important is that patient treatment for PE is systemic - i.e., many aspects of the treatment are the same whether the patient has one or many PE's. For this task, we make the following definitions:
Patient sensitivity is defined as the number of patients for whom at least one true PE is correctly identified. As above, a PE is correctly identified if any one of the candidates associated with that PE is correctly labeled, and multiple correct identifications in a single patient do not increase the sensitivity score.
False positives are defined as the number of candidates falsely labeled as a PE in a patient.
The average FP rate for a test set is the average number of FPs produced across all patients in that test set.
Again, for this task, 3 classifiers should be built, and any classifier that yields an average FP rate above the specified FP threshold on any sub-task will be disqualified.
- Task 2a. Build a system where the false positive rate is at most 2 per patient.
- Task 2b. Build a system where the false positive rate is at most 4 per patient.
- Task 2c. Build a system where the false positive rate is at most 10 per patient.
In each task, the classifiers will be ranked based on patient sensitivity as long as the false positive rate obeys the specified FP rate. You may use the same classifier(s) as in Task 1, or build different classifiers for this task.
Task 3: One of the most useful applications for CAD would be a system with very high (100%?) Negative Predictive Value. In other words, if the CAD system had zero positive candidates for a given patient, we would like to be very confident that the patient was indeed free from PE's. In a very real sense, this would be the "Holy Grail" of a PE CAD system.
Unfortunately, as our training set contains relatively few negative PE cases (20 in all), building such a classifier may be a very hard task. However, we anticipate that we will have a larger number of negative cases by the time the test data set is released, allowing a better measure of the performance of the system on this task.
For this task, we make the following definitions:
A patient is identified as negative when the CAD system produces no positive labels for any of that patient's candidates.
The negative prediction value (NPV) for a classifier is TN/(TN+FN) (i.e., number of true negatives divided by the total of true and false negatives).
Note that the NPV is maximized by a classifier that correctly identifies some negative patients but produces no false negatives (no positive patients identified as negative).
To qualify for this task, a classifier must have 100% NPV (i.e., when it says a patient has no positive marks, the patient must have no true PE's). The primary criterion will be the highest number of negative patients identified in the test set (largest TN), subject to a minimum cut-off of identifying 40% of the negative patients on the test set. The first tie breaker will be the sensitivity on PE's (as defined in Task 1), followed by the false positive rate on the entire test set.
More task descriptions are provided in this PDF file.
Each submission will be evaluated according to the criteria set forth under each task, above. The winner for each task will be the group with the best score according to the specified metric for that task. In the event of a tie, multiple winners may be awarded or, at the chair's option, a tie-breaking metric may be employed. Results of the competition will be announced individually to participants in advance of KDD; public announcement of results will be during the opening ceremony of KDD.