Breast cancer (BCa) is an increasingly common cancer diagnosis in women. Advancements in screening, diagnostic, and therapeutic techniques in BCa have improved survival rates in recent years. (Jemal, A. et al., “Declining death rates reflect progress against cancer,” PLoS One, vol. 5, no. 3, p. e9584, 2010. Available at dx.doi.org/10.1371/journal.pone.0009584) One of the earliest and most popular diagnostic criteria is the Bloom-Richardson (BR) grade. (Bloom, H. J. et al., “Histological grading and prognosis in breast cancer; a study of 1409 cases of which 359 have been followed for 15 years,” Br J Cancer, September 1957, 11(3): 359-377). The BR grade is determined by a pathologist via visual analysis of hematoxylin and eosin (H & E) stained histopathology. The importance of the BR grading system for the purpose of predicting disease outcome has been widely studied. (Contesso, G. et al., “The importance of histologic grade in long-term prognosis of breast cancer: a study of 1,010 patients, uniformly treated at the institut gustave-roussy,” J Clin Oncol, September 1987, 5(9): 1378-1386; Henson, D. E. et al., “Relationship among outcome, stage of disease, and histologic grade for 22,616 cases of breast cancer. the basis for a prognostic index,” Cancer, November 1991, 68(10): 2142-2149; Elston, C. W. et al., “Pathological prognostic factors in breast cancer. i. the value of histological grade in breast cancer: experience from a large study with long-term follow-up,” Histopathology, November 1991, 19(5): 403-410). Yet clinical usage of the BR grading system is often limited by concerns about intra- and inter-rater variability. Meyer et al. found that agreement between seven pathologists is only moderately reproducible (k=0.50-0.59). (Meyer, J. S. et al., “Breast carcinoma malignancy grading by bloom-richardson system vs proliferation index: reproducibility of grade and advantages of proliferation index,” Mod Pathol, August 2005, 18(8): 1067-1078). Dalton et al. further noted the suboptimal treatment that can result from incorrect BR grading. (Dalton L. W. et al., “Histologic grading of breast cancer: linkage of patient outcome with level of pathologist agreement.” Mod Pathol, July 2000, 13(7): 730-735). Boiesen et al. showed similar levels of reproducibility (=0.50-0.54) across a number of pathology departments. (Boiesen, P. et al., “Histologic grading in breast cancer—reproducibility between seven pathologic departments. south Sweden breast cancer group,” Acta Oncol, 2000, 39(1): 41-45). A possible reason for such variability is that pathologists currently lack the automated image analysis tools to accurately and efficiently quantify BR grade in histopathology. There thus is a need for an inexpensive image-based computerized grading scheme for predicting disease outcome.
The BR grading system encompasses three visual signatures (degree of tubule formation, nuclear pleomorphism, and mitotic activity), each of which is scored on a scale of 1 to 3 to produce a combined BR scale ranging from 3 to 9. Computerized modeling of the phenotypic appearance of BCa histopathology has traditionally focused on the size and shape of nuclei (Wolberg, W. H. et al., “Computer-derived nuclear features distinguish malignant from benign breast cytology,” Hum Pathol, July 1995, 26(7): 792-796; Doyle, S. et al., “Automated grading of breast cancer histopathology using spectral clustering with textural and architectural image features,” in Proc. 5th IEEE International Symposium on Biomedical Imaging: From Nano to Macro, 2008, pp. 496-499) as well as various textural representations (Wolberg, W. H. et al., “Computer-derived nuclear features distinguish malignant from benign breast cytology,” Hum Pathol, July 1995, 26(7): 792-796; Weyn, B. et al., “Automated breast tumor diagnosis and grading based on wavelet chromatin texture description,” Cytometry, September 1998, 33(1): 32-40; Petushi, S. et al., “Large-scale computations on histology images reveal grade differentiating parameters for breast cancer,” BMC Med Imaging, 2006, 6: 14; Karaali, B. et al., “Automated detection of regions of interest for tissue microarray experiments: an image texture analysis,” BMC Med Imaging, 2007: 2; and Hall, B. H. et al., “Computer-assisted assessment of the human epidermal growth factor receptor 2 immunohistochemical assay in imaged histologic sections using a membrane isolation algorithm and quantitative analysis of positive controls,” BMC Med Imaging, 2008,8: 11).
Nuclear architecture refers to the 2D spatial arrangement of cancer nuclei, whose variations allow clinicians to distinguish between normal and cancerous tissues. This concept is modeled by the construction of graphs, whereby individual cancer nuclei are used as vertices and statistics related to the size, shape, and length of the graphs are then extracted from each image. A method using such graph-based features to distinguish variations of lymphocytic infiltration was described by Basavanhally et al. (Basavanhally, A. N. et al., “Computerized image-based detection and grading of lymphocytic infiltration in her2+ breast cancer histopathology,” IEEE Trans Biomed Eng, March 2010, 57(3): 642-653). Doyle et al. described a method to distinguish variations in tumor grade extracting such graph-based features (Doyle, S. et al., “Automated grading of breast cancer histopathology using spectral clustering with textural and architectural image features,” in Proc. 5th IEEE International Symposium on Biomedical Imaging: From Nano to Macro, 2008, pp. 496-499). Basavanhally et al. described a method to use such graph based featured for prognosis using digitized BCa histopathology, and hierarchical tissue structure (Basavanhally, A. et al., “Computer aided prognosis of er+ breast cancer histopathology and correlating survival outcome with oncotype dx assay,” in Proc. IEEE Int. Symp. Biomedical Imaging: From Nano to Macro ISBI '09, 2009, pp. 851-854). Such studies have been shown in glioma, as described by Demir et al. (Demir, C. et al., “Augmented cell-graphs for automated cancer diagnosis,” Bioinformatics, September 2005, 21(Suppl 2): i7-i12) and in distinguishing tumor grade in prostate cancer, as described by Doyle et al. (Doyle, S. et al., “Automated grading of prostate cancer using architectural and textural image features,” in IEEE International Symposium on Biomedical Imaging (ISBI), Washington D.C., 2007, pp. 1284-87). The extraction of textural information from nuclear regions (i.e. nuclear texture) represents the variation in chromatin arrangement, as described by Weyn et al. (Weyn, B. et al., “Automated breast tumor diagnosis and grading based on wavelet chromatin texture description.” Cytometry, September 1998 33(1): 32-40). Such variation of chromatin structure is related to different stages in the cell cycle. Here, this concept is modeled via segmentation of all nuclear regions within an image, from which Haralick texture features are calculated, as described by Haralick (Haralick, R. M. et al., “Textural features for image classification,” IEEE Transactions on Systems, Man and Cybernetics, November 1973, 3(6): 610-621) [19]. Weyn et al. described a method using wavelet, Haralick, and densitometric features to distinguish nuclei from low, intermediate, and high BCa tissues (Weyn, B. et al., “Automated breast tumor diagnosis and grading based on wavelet chromatin texture description,” Cytometry, September 1998, 33(1): 32-40). Doyle et al. also utilized Haralick texture features to discriminate low and high grade BCa histopathology (Doyle, S. et al., “Automated grading of breast cancer histopathology using spectral clustering with textural and architectural image features,” in Proc. 5th IEEE International Symposium on Biomedical Imaging: From Nano to Macro, 2008, pp. 496-499).
The extraction of relevant features is important to a computerized BCa grading system; however, the selection of appropriate fields of view (FOVs) must also be considered in the analysis of large histopathology slides. This step is important due to the heterogeneous nature of BCa, whereby predictions based on isolated FOVs may not accurately reflect the level of malignancy or heterogeneity in an entire histopathology slide, as described in Connor et al. (Connor, A. J. M. et al., “Intratumoural heterogeneity of proliferation in invasive breast carcinoma evaluated with mibi antibody,” The Breast, 1997, 6(4): 171-176). The heterogeneous nature of BCa histopathology is illustrated in FIG. 1. Prior work in histological image analysis has traditionally involved empirical selection of individual FOVs at a fixed size based on experimental results, as described by Doyle et al. (Doyle, S. et al., “Automated grading of breast cancer histopathology using spectral clustering with textural and architectural image features,” in Proc. 5th IEEE International Symposium on Biomedical Imaging: From Nano to Macro, 2008, pp. 496-499), Weyn et al. (Weyn, B. et al., “Automated breast tumor diagnosis and grading based on wavelet chromatin texture description.” Cytometry, September 1998, 33(1): 32-40), Petushi et al. (Petushi, S. et al., “Large-scale computations on histology images reveal grade differentiating parameters for breast cancer.” BMC Med Imaging, 2006, 6: 14), Basavanhally et al. (Basavanhally, A. N. et al., “Computerized image-based detection and grading of lymphocytic infiltration in her2+ breast cancer histopathology.” IEEE Trans Biomed Eng, March 2010, 57(3): 642-653), Basavanhally et al. (Basavanhally, A. et al., “Computer aided prognosis of er+ breast cancer histopathology and correlating survival outcome with oncotype dx assay,” in Proc. IEEE Int. Symp. Biomedical Imaging: From Nano to Macro ISBI '09, 2009, pp. 851-854), Gurcan et al. (Gurcan, M. N. et al., “Histopathological image analysis: A review.” IEEE Rev Biomed Eng, 2009, 2: 147-171), and (Sertel, O. et al., “Computer-aided prognosis of neuroblastoma on whole slide images: Classification of stromal development.” Pattern Recognit, June 2009, 42(6): 1093-1103).
In image processing, multi-scale (i.e. multi-resolution) frameworks are traditionally used to interpret contextual information at different scales of an image scene. (Doyle, S. et al., “Detecting prostatic adenocarcinoma from digitized histology using a multi-scale hierarchical classification approach,” IEEE EMBS, 2006, 1: 4759-4762). While clinicians perform this task implicitly, the a priori selection of an optimal FOV (i.e. image patch) size for computerized analysis of entire histopathology slides is not straightforward. Most multi-scale frameworks operate by exposing a single field of view (FOV) to classifiers at multiple image resolutions, for example as described in Doyle et al. (Doyle, S. et al., “A boosted bayesian multi-resolution classifier for prostate cancer detection from digitized needle biopsies,” Biomedical Engineering, IEEE Transactions on, 2010, PP(99): 1). FIG. 2(a) is an illustration of a multi-scale framework that is useful for quantifying large-scale image patterns. However, analyzing domain-specific image architecture is more challenging, since it remains invariant to changes in scale (although our visual perception and ability to detect objects within the image will vary). Contrary to a multi-scale framework, a multi-field of view (multi-FOV) framework uses a fixed scale (i.e. resolution) and extracts features at FOVs of different sizes. FIG. 2(b) is an illustration of a multi-FOV scheme. A multi-FOV framework is advantageous for highly heterogeneous images, where it is not clear which FOV sizes will produce discriminatory features.
For example, in the context of a breast cancer digital slide, in an exemplary histological image from a breast cancer digital slide shown in FIG. 2(b), while the smallest FOV (i.e. leftmost image) simply looks like necrotic tissue, the medium-sized FOV (i.e. center image) would be accurately classified as ductal carcinoma in situ (DCIS). At the other end of the spectrum, the largest FOV (i.e. rightmost image) containing both DCIS and invasive cancer would be classified ambiguously since it is too heterogeneous.
Hierarchical classifiers are commonly used in conjunction with multi-scale frameworks, such as described in Doyle et al. (Doyle, S. et al., “Detecting prostatic adenocarcinoma from digitized histology using a multi-scale hierarchical classification approach,” IEEE EMBS, 2006, 1: 4759-4762). In hierarchical classifiers, data inputted at a zero level are analyzed for feature extraction and a zero level classifier is used to output a zero-level level prediction. The zero level prediction is then inputted to the first level for analysis, feature extraction and a first level prediction by the first level classifier. Such a sequential mode hierarchical decision making continues depending on the number of levels of the hierarchical classifier until the final level classifier outputs the class decision. FIG. 3(a) is an illustration of a hierarchical classifier that involves three hierarchical levels of decision making, a zero level, a first level and a second level. In such a multi-scale scheme, the data S1 exposed to classifier C1 is dependent on the prediction C0 made at the previous level. Hierarchical classifiers are able to increase computational efficiency by analyzing only relevant data. However, they are inherently serial processes that cannot leverage integration techniques (e.g. consensus) to more effectively combine predictions returned by C0, C1, and C2.
On the other hand, parallel classifiers simultaneously expose different amounts of data to extract relevant features, and perform classification independently before finding a consensus (Σ) among the individual predictors. FIG. 3(b) is an illustration of a parallel classifier using three parallel/simultaneous classifiers. In such a parallel classifier, different amounts of data, S0, S1, and S2, are inputted in parallel, corresponding features, f0, f1, and f2 are extracted simultaneously and the three parallel predictors, C0, C1, and C2 made independent and simultaneous predictions, which are then combined to find a consensus (Σ) class decision. For instance, Breiman showed that a parallel classifier scheme known as bagging, in which many independent weak learners are aggregated, could lead to improved performance over a majority of the individual weak learners. (Breiman, L. “Bagging predictors,” Machine Learning, 1996, 24: 123-140). Another popular classifier ensemble scheme is boosting, described in Schapire (R. E. Schapire, “The boosting approach to machine learning: An overview,” in Nonlin. Est. and Class., Springer 2003, pp. 1-23), which aims to improve overall performance by identifying and weighting individual weak learners that contain class discriminatory information.
In microscopy, changes in scale (i.e. magnification or resolution) are inextricably linked to changes in FOV size. Thus, pathologists naturally incorporate different scales and FOV sizes before arriving at a diagnosis based on a patient slide. Although clinicians perform this task implicitly, the a priori selection of an optimal FOV (i.e. image patch) size for computerized analysis of an entire histopathology slide is not straightforward. Sertel et al. (Sertel, O. et al., “Computer-aided prognosis of neuroblastoma on whole-slide images: Classification of stromal development,” Patt Recognit, 2009, 42(6): 1093-1103) and Basavanhally et al. (Basavanhally, A. et al., “Computer-aided prognosis of er+ breast cancer histopathology and correlating survival outcome with oncotype dx assay,” in IEEE ISBI, 2009, pp. 851-854) described logical image analysis methods for detection or grading that involve selecting a fixed FOV size empirically based on classification results. The application of hierarchical, multi-scale classifiers has been considerably more popular for analysis of large, high-resolution digitized histopathology images. (Doyle, S. et al., “Detecting prostatic adenocarcinoma from digitized histology using a multi-scale hierarchial classification approach,” IEEE EMBS, 2006, 1: 4759-4762; M. Gurcan et al., “Computerized pathological image analysis for neuroblastoma prognosis,” AMIA Annu Symp Proc, pp. 304-308, 2007). Petushi et al. (Petushi, S. et al., “Large-scale computations on histology images reveal grade-differentiating parameters for breast cancer,” BMC Med Imaging, 2006, 6: 14) describe a parallel classifier scheme in which two specific FOV sizes are chosen to help classify breast cancer (BCa) nuclei into morphological categories; however, the choice of FOV sizes was not justified.
The present invention provides a method for analyzing a large, heterogeneous image using a parallel, boosted, multi-field of view (multi-FOV) classifier. The parallel, boosted multi-FOV classifier automatically integrates image features from multiple FOVs at various sizes to differentiate entire ER+BCa histopathology slides based on their BR grades. The multi-FOV framework of the present invention uses a fixed image scale and extracts image features at FOVs of different sizes, a highly desirable attribute in heterogeneous images where it is not clear which FOV sizes will contain class discriminatory information, by automatically extracting image features characterizing both the architecture and texture of BCa cancer nuclei, both of which reflect various aspects of the BR grading system. The present invention circumvents the need for empirically determining an optimal FOV size by calculating image features at multiple FOV sizes by embedding the problem within a boosted framework that combines the best discriminatory information across all FOV sizes