We propose to conduct two different yet closely related challenges based on this data. On the test data, the participants have to return two different files, one corresponding to each challenge.
1. The rate of prevalence of malignant patients in a screening environment is extremely low (on average only around 5-10 patients out of 1000 screening patients have breast cancer). Therefore, in the first challenge, the participating entries will be judged in terms of the area under the FROC curve in the clinically relevant region 0.2-0.3 False positives per image. To support this, the participants have to return a file with a confidence score for every candidate of the test set (from - infinity to + infinity) that indicates the confidence of their classifier that the candidate is malignant. A score of +infinity corresponds to absolute confidence that the candidate is malignant, and a score of - infinity indicates absolute confidence that the candidate is benign.
2. In the second challenge, our aim is to reduce the workload for radiologists, by asking them to only read a subset of cases that the algorithm deems at least somewhat unclear or suspicious. Thus our second challenge is evaluated in terms of the fraction of patients who are labeled as completely normal (not requiring radiologist review of images) such that the CAD algorithms have a 100% sensitivity of the malignant patients. (CAD systems which fail to have a 100% sensitivity will be disqualified from the challenge). To support this challenge, the participants have to return another file with a binary classification decision about whether each patient in the test set should be reviewed by a radiologist.
Background in Breast Cancer
Breast cancer is a disease in which malignant (cancer) cells form in the tissues of the breast. Breast cancer is the second leading cause of cancer deaths in women today (after lung cancer) and is the most common cancer among women, except for skin cancers. About 1.3 million women are expected to be diagnosed annually with breast cancer worldwide, and about 465,000 will die from the disease. In the United States alone, in 2007 an estimated 240,510 women were expected to be diagnosed with breast cancer, and 40,460 women are expected to have died from breast cancer.
Screening is looking for cancer in asymptomatic people - i.e., before a person has any symptoms of the disease. Cancer screening can help find cancer at an early stage. When abnormal tissue or cancer is found early, it is often easier to treat. By the time symptoms appear, cancer may have begun to spread. The good news is that breast cancer death rates have been dropping steadily since 1990, both because of earlier detection via screening and better treatments.
The most common breast cancer screening test is a mammogram. A mammogram is an x-ray of the breast. The ability of a mammogram to find breast cancer may depend on the size of the tumor, the density of the breast tissue, and the skill of the radiologist. The mammogram is considered the standard of care for most asymptomatic women. For instance, in the US, insurance companies routinely reimburse for an annual screening mammography examination, for all asymptomatic women over the age of 40. These exams are credited with reducing the breast cancer death rate by approximately 30% since 1990.
However, the reading of screening mammograms is challenging. Findings on a screening mammogram leading to further recall are identified in approximately 5%-10% of patients, even though breast cancer is ultimately confirmed in only three to ten cases in every 1,000 women screened. Perhaps even more importantly, there is compelling evidence that many breast cancers detected at screening mammography are, in retrospect, visible on the previously obtained mammograms but have been missed by the interpreting radiologist in the prior year. There are several reasons for this: The complex radiographic structure of breast tissue, particularly in dense breasts; the subtle nature of many mammographic characteristics of early breast cancer; human oversight; poor quality films and even fatigue or distraction are all reasons why cancer is not detected by mammography.
To overcome the known limitations of human observers, second (ie double) reading of screening mammograms by another radiologist has been implemented at many sites. Studies indicate a potential 4%-15% increase in the number of cancers detected with double reading. In a radiology practice that performs 10,000 screening examinations per year, generally between 30-100 cancers per year will be detected. Thus, double reading in this practice could contribute to the diagnosis of 1-15 additional cancers per year. However, this approach results in a doubling of the radiologist-effort so it is not financially viable.
Rapid and continuing advances in computer technology, as well as the ready adaptation of radiology images to digital formats, have increased the interest in computer prompting to enable the attending radiologist to act as his or her own second reader. One very promising adaptation of computer-prompting technology is computer-aided detection (CAD) in screening mammography. Current CAD systems demonstrate a high rate of detecting cancerous features on mammograms, but further improvements in both sensitivity and specificity would lead to tremendous benefits both in terms of lives saved each year, and in terms of reduction n the workload of radiologists. For the last 8-10 years, US insurance companies have begun to provide additional reimbursement to mammographers who run CAD algorithms on the mammograms - in other words, physicians are now reimbursed for running a machine learning algorithm to help them better detect cancer.
In an almost universal paradigm, the CAD problem is addressed by a 4 stage system:
- candidate generation which identifies suspicious unhealthy candidate regions of interest (candidate ROIs, or simply candidates) from a medical image;
- feature extraction which computes descriptive features for each candidate so that each candidate is represented by a vector x of numerical values or attributes;
- classification which differentiates candidates that are malignant cancers from the rest of the candidates based on x; and
- visual presentation of CAD findings to the radiologist.
In this challenge, we focus on stage 3, learning the classifier to differentiate malignant cancers from other candidates.
The obvious method of classification is to try to build classifiers that simply label each candidate independently. Below we present a few ideas that participants in the challenge may want to consider to potentially improve their algorithms.
Leverage two views of the same breast: Almost always, a cancerous lesion is visible in both views (MLO, CC) of the breast - radiologists routinely try to correlate the two views while diagnosing the patient. In rare cases, however, some lesions may only be visible in one view, especially in certain areas of the breast. However, negative candidates may either be present in one view (e.g., for image artifacts) or in both views (e.g., if generated by the presence of benign cyst).
Unfortunately, since each view is a 2D image obtained from an orthogonal direction, it is not possible to perfectly register (i.e., correlate the locations across) the X-ray images using simple algorithms, e.g., using affine transformations. However, some of a lesion's features are typically preserved across the two views; particularly, the distance of a lesion from the nipple, and perhaps some of the features themselves relating to size of the lesion, texture, etc. Thus the first idea that may be useful for this challenge is to develop algorithms that simultaneously classify candidates from a pair of images from the same breast. These algorithms could try to exploit correlations in classification decisions for the same region of a breast. To support this, training and testing data sets will include features that identify the (x,y) location of the nipple as well as the (x,y) location of the candidate.
Class Imbalance: Participants will be able to leverage ideas from classifier design under extreme class imbalance (the vast majority of the regions are normal, and only a small fraction of the regions are actually malignant), and feature selection (a large number of features are proposed and several of them may not be very useful for the task). The prevalence rate (malignant patients as a fraction of all patients) may differ between the training and testing sets.
Exploit correlations within an image: Participants may develop novel algorithms for exploiting potential correlations between the diagnoses
of suspicious regions within a single image (e.g. if they are spatially adjacent).
Optimize AUC only in narrow FP range: It may be useful to develop training algorithms to maximize the area under the ROC curve (AUC) in a clinically relevant false positive (FP) range, a problem that has not been adequately addressed in the machine learning/data-mining current literature.