1. Field of the Invention
This invention pertains generally to image analysis, and more particularly to cellular image analysis.
2. Description of Related Art
While the specific factors for developing breast cancer are diverse and not completely understood, it is estimated (based on 2000-2002 statistics) that 13.2% of women born today in the United States will be diagnosed with breast cancer.
This statistic, generally reported as “1 in 8,” is the cumulative risk if a woman were to live 110 years; the actual risk depends on age, a bimodal distribution with the first peak at premenopause (40-50 years) and the second at approximately 65 years. Using data from 1969-2002, it was estimated that 211,240 new cases of invasive breast cancer and 58,490 new cases of non-invasive breast cancer would be diagnosed in 2005, while 40,410 women would succumb. Breast cancer is now known to consist of a variety of diseases; complicating the diagnosis and treatment procedures are a large number of conditions that mimic breast cancer and may present an increased risk for developing breast cancer.
Of particular interest are benign diseases and the possibility that these may be precursors to malignant conditions (i.e., “premalignancy”)]. Established cytologic (cell) criteria can be contradictory, and even histologic (tissue) criteria (considered the gold standard for diagnosis) are subject to varied interpretation.
It has become common clinical practice to remove all breast masses due to the possibility of breast cancer, even though 80% of these excised lesions turn out to be benign. There is thus a need to quantitatively define characteristics of breast cancer to better coordinate clinical care of women presenting breast masses.
Infrared vibrational and Fourier transform spectroscopy have been used for classification of prostate and cervical tissue, in which very specific spectral information (at the molecular-level) is used, but spatial attributes are generally not utilized. Infrared spectroscopy uses infrared light to excite vibrations in the molecular structure of a specimen. These are reflected as energies which give insight into the underlying molecular constituents. In the more common case of Fourier transform infrared spectroscopy, instead of an array of energies, an interferogram is produced; the Fourier transform of this interferogram produces the energies. Using a pool of features hand-picked by spectroscopist analysis of pathologist-marked regions, Fernandez et al. achieved accuracies of 90.1%-100.0% for classification of histologic classes from 250 tissue microarray cores from 40 patients, and 100% classification of adenocarcinoma versus normal tissue. Focusing on the glycogen depletion in dysplastic (abnormal) tissue, Shaw et al. [30] achieved accuracies of 60%-70% in separating normal from abnormal Pap smears. A back propagation neural network was used by Zhao et al. along with feature sets derived from intensity statistics and the wavelet domain; pixel-level classification demonstrated a sensitivity of 95.2%, and nuclear-level classification of 97.6% for a dataset of 40 images. Segmentation of nuclei in fluorescence imagery is presented by Lin et al. [31] and Wählby et al. [32] for 2D imagery and 3D confocal imagery. Lin et al. [31] used a recursive, tree-based algorithm, and reported accuracy of 96.3% for 10 images of rodent brains. Wählby et al. [32] used a marker-based watershed transformation, region merging, and shape-based cluster separation; an accuracy of 91%-96% was achieved for a total of 6 2D images of cervical and prostatic carcinomas.
Luck et al. [33] describe segmentation for in vivo confocal reflectance imagery of cervical tissue based on nuclear modeling, anisotropic median diffusion, Gauss-Markov random fields, and a Bayesian classifier. The authors reported a 90% match to hand-segmented nuclei with an average of 6 false positives per frame.
The gross examination and manual feature extraction for 212 liver nodules, correlation-based feature selection, and several classification schemes (including decision trees, k-Nearest Neighbor (k-NN) classification, naive Bayes, and neural networks), resulted in a 95%-100% sensitivity and specificity for diagnosis as one of 3 conditions in Ciocchetta et al. [34].
Demir et al. [35] and Gunduz et al. [36] present a nuclei segmentation algorithm for H&E stained brain biopsies using the La*b* color space and a k-means algorithm. Features extracted are based on the concept of cell graphs [36] and augmented cell graphs [35], including such features as degree and eccentricity commonly defined for graph structures. Classification of normal, inflamed, and cancerous tissue was performed by an artificial neural network, with accuracies of 96.9%-97.1% for 64 patients in [35] and 85.7%-94.0% for 12 patients in [36].
The HSV color space was used by Sammouda et al. [37] for segmentation of H&E stained lung nuclei, using a Hopfield neural network and maximum drawable circle algorithm, and nuclear radii as features. 92%-97% of cancerous nuclei were correctly detected in 16 images.
Roula et al. [38] used a multispectral dataset (33 bands) of H&E stained prostate tissue, extracted texture and mathematical morphology features, reduced dimensionality using principal components analysis (PCA), and classified using quadratic discriminant analysis. Classification error for 230 hand-picked nuclei representing 3 conditions was 5.1%. GENIE (an automated feature extraction system developed at Los Alamos National Laboratory) has been used with a multispectral dataset (31 bands) of Pap-stained urine cytology images as presented by Angeletti et al. [39]. Using a training set of 12 cases and validation sets of 17 and 8 cases, GENIE was able to discriminate between malignant and benign urothelial cells with a sensitivity of 85%-87% and specificity of 96%. Additionally, GENIE was able to correctly classify atypical urothelial cell clusters as benign or malignant (determined clinically by 1-year followup) with an AUC of 0.728.
Narrowband red images (610 nm, 621 nm) have been used for detection of laryngopharyngeal cancer [8], discrimination of cervical cancers and atypias [9], and separation of benign hyperplastic prostatic lesions from true prostatic carcinoma [10]. Additionally Brewer et al. [40] used the red channel from standard RGB light microscopy to classify epithelial and stromal (connective tissue) nuclei in ovarian tissue. In Brewer et al. [40], 7 features were hand selected from 93 karyometric (nuclear) features to discriminate cancerous and benign conditions, resulting in accuracy of 66%-78% for H&E stained sections from 20 patients. Zahniser et al. [9] used narrowband images at 621 nm (for nuclear material) and 497 nm (for cytoplasmic material) of Feulgen and Orange II stained cervical Pap smears and linear discriminant analysis at both the single-cell and cell-cluster level. Zahniser et al. [9] reported classification accuracy of 100% for normal (14 cases), 45% for benign change (11 cases), and 97% for abnormal (29 cases). Both Neheret al. [8] and Mairinger et al. [10] used the CytoSavant image analysis system from OncoMetrics, Inc. (Vancouver, BC, Canada) to extract 114 nuclear features (plus mean, maximum, minimum, and variance for each) from 610 nm narrowband images of Feulgen stained tissue, and used stepwise linear discriminant function analysis for classification; Neheret al. [8] reported sensitivity and specificity of 72.7% and 82.4% for 145 cases and Mairinger et al. [10] reported sensitivity and specificity of 92% and 95% for 240 cases.
Similarly, narrowband green images (565 nm) were used for analysis of Feulgen-stained lung tissue [20, 41] and prostate tissue [41]. Weyn et al. [20] used 82 features (plus mean and standard deviation of each), including densitometry (optical density-related), morphometry, texture, and syntactic structure analysis (SSA) (related to spatial arrangement) measures. For a dataset of 39 cases of malignant mesothelioma, 20 cases of pulmonary adenocarcinoma, and 7 cases of hyperplastic mesothelium, k-NN classification yielded accuracies of 83.9%-96.8% for discrimination of the three conditions, 79.5%-94.9% in typing malignant mesothelioma, and 60.0%-82.9% for prediction of prognosis for malignant mesothelioma [20]. Weyn et al. in [41], on the other hand, derived features from the imagery using Voronoi diagrams, Gabriel's graphs, and minimum spanning trees, all of which quantitate spatial arrangement of tissue constituents. k-NN classification yielded correct disease classification of 73.9% (51 cases), correct typing of malignant mesothelioma of 82.6% (44 cases), and correct grading of prostatic adenocarcinoma of 75.7% (38 cases).
Analysis of immunostained imagery by Weyn et al. [22] used a CD31 immunostain (highlighting endothelial cells) and Hematoxylin counterstain to quantify prognosis based on vascular patterns in colorectal, cervical, and lung tissue. Extracted features include fractal analysis, vessel-derived (some manual), syntactic structure analysis, and clinical data (manual), as well as the mean, standard deviation, skewness, and kurtosis for each feature. Prognostic accuracies using a k-NN classification were 83.3% for cervical (78 images), 70.6% for colorectal (74 cases), and 86.4% for lung (27 images).
Ballerini and Franzén [42] (2004) utilized light microscopy of breast cancer tissue with immunohistochemically stained epithelium and Feulgen-staining of the nuclei. This method used fuzzy c-means clustering and conditional dilation to segment nuclei, and a neural network for classification. Extracted features include granulometric moments, fractal analysis, and mathematical morphology. 20 cases, with 10 images per case, were analyzed with this method, resulting in 87%-93% correct classification of normal tissue, fibroadenosis (a benign condition), and ductal and lobular cancer.
Harvey et al. [43] (2003) used the GENIE automated feature extraction system for detection of cancerous nuclei in multispectral H&E stained histopathology images of breast tissue. Using a training set of 7 images and a test set of 8 images, GENIE attained an average detection rate of 82.3%-87.4% and average false alarm rate of 0.4%-15.8%.
Lee and Street [44] (2003) present a neural network-based method to automatically detect, segment, and classify breast cancer nuclei in gray-scale cytological images from fine needle aspirations (FNA) of the breast. Nuclear features include size, perimeter, smoothness, concavity, and 24 radii from each nucleus. Overall, 94.1% of nuclei were correctly delineated in a dataset of 140 images, and 94%-96% of nuclei were correctly classified as malignant.
Latson et al. [17] (2003) implemented an automated segmentation algorithm for epithelial cell nuclei based on the application of fuzzy c-means to the hue band (of HSV color space) followed by a marker-based watershed transform. Results for a dataset of 39 H&E histopathology images found 57.2%-71.6% correctly segmented nuclei, with a variation in performance for typical hyperplasia, atypical hyperplasia, cribriform ductal carcinoma in situ, and solid ductal carcinoma in situ. Clumps, poorly segmented individual nuclei, and missed nuclei were 4.5%-16.7%, 22.5%-26.3%, and 0.4%-1.4%, respectively.
van de Wouwer et al. [45] (2000) used green-filtered (565 nm) light microscopy images of Feulgen-stained breast tissue sections to extract features for k-NN classification of breast tissue. Features included densitometry, first- and second-order texture parameters, wavelets, and mathematical morphology. For a dataset of 20 normal and 63 invasive ductal carcinomas, 67.1% of nuclei and 100% of patients were classified correctly.
Herrera-Espiñeira et al. [46] (1998) used two different segmentation algorithms, one for nonoverlapping nuclei (histogram-based threshold) and one for overlapping nuclei (edge detection and ellipse fitting); the choice in algorithms was decided by the human observer based on the image at hand. Nuclear features allowed 89.4%-91.5% average accuracy in discriminating benign (47 cases) from malignant (95 cases) for Pap-stained grayscale cytology imagery.
Weyn et al. [18] (1998) used the same imagery and similar analysis to [45]. Results in this study were 76.1% accuracy in benign versus malignant classification of images and 100% accuracy for patients. Also studied here was cancer grading, with 61.5% image accuracy and 78.5% patient accuracy.
Wang et al. [15] (1997) present a method for detection of breast cancer nuclei in light microscopy images of tissue immunostained for estrogen and progesterone receptors and counterstained with Hematoxylin. This method used receptive field theory, soft thresholding, and lighting correction to segment nuclei; the classification of nuclei was based on the YUV color space and derived features (average and variance) as well as a local texture measure. For a dataset of 28 images, the authors achieved a sensitivity of 83%.
Anderson et al. [19] (1997) applied a knowledge-guided approach previously developed by Thompson et al. [47] for segmentation of cribriform gland tissue to segmentation and architectural discrimination of H&E stained ductal breast lesions. Features were computed at the glandular and lumen level. The dataset was composed of 215 images from 22 cases of ductal carcinoma in situ and 21 cases of ductal hyperplasia. Glandular features provided 63% correct image and 65% correct patient classification, lumen features provided 70% correct image and 72% correct patient classification, combined features provided 83% correct patient classification.
Overall, for breast cancer image analysis, there is a range in classification accuracy. In general, however, the accuracy increases as the classification progresses from individual nuclei to image-level to patient-level. In particular, for previous studies on H&E imagery, the image-level classification accuracy is less than 90%.
Table 1 shows performance of general state-of-the-art histo/cytopathology image analysis. Table 2 shows performance of state-of-the-art histo/cytopathology image analysis for breast cancer.
Accordingly, an object of the present invention is to quantitatively define characteristics of breast cancer to better coordinate clinical care of women presenting breast masses.
A further object of the present invention is quantitative cytologic and histologic analysis of breast biopsy specimens, using expert (pathologist) input to guide the classification process.