The current gold standard in identifying many disease states is the subjective visual interpretation of microscopic histology of fixed tissue sections of the involved organ. Examples of this include the diagnosis of cancer as well as many inflammatory and degenerative diseases. Over the past decade, the increasing ability to assay genomic information led to improved classification of a variety of pathological processes using diagnostic, prognostic patterns of gene expression and/or genomic changes. The present invention details an automated, computerized system and method for analyzing histopathology imagery that will produce a quantitative and reproducible metric, i.e. Image-based Risk Score, for predicting disease outcome and patient survival. Following are two specific embodiments of the present invention that use breast cancer as a model disease state where well-validated gene expression-based classifiers have led to significant clinical impact.
Breast cancer (BC) is one of the leading causes of cancer-related deaths in women, with an estimated annual incidence greater than 192,000 in the United States in 2009 (source: American Cancer Society).
One embodiment of the present invention involves a subset of BC that comprises cancer cells that have not spread to the lymph nodes and with overexpression of the estrogen receptor (LN−, ER+ BC). Although cases of LN−, ER+ BC are treated with a combination of chemotherapy and adjuvant hormone therapy, the specific prognosis and treatment is often determined by the Oncotype DX gene expression assay [1]. The Oncotype DX gene expression assay produces a Recurrence Score (RS) between 0-100 that is positively correlated to the likelihood for distant recurrence and the expected benefit from chemotherapy [1].
The manual detection of BC nuclei in histopathology is a tedious and time-consuming process that is unfeasible in the clinical setting. Previous approaches to cell segmentation—thresholding [2], clustering [3], and active contour models [4]—are not very robust to the highly variable shapes and sizes of BC nuclei, as well as artifacts in the histological fixing, staining, and digitization processes.
Previous work [1] has shown that the Oncotype DX RS is correlated with BC grade. Cancer grade reflects the architectural arrangement of the tissue and is correlated with survival (high grade implies poor outcome). Pathologists often disagree on the grade of a BC study. With the recent advent of digital pathology, researchers have begun to explore automated image analysis of BC histopathology. Wolberg et al. [6] used nuclear features from manually segmented BC nuclei to distinguish benign and malignant images. Bilgin et al. [7] explored the use of hierarchical graphs to model the architecture of BC histopathology. Textural features were used by Hall et al. [8] to examine variations in immunohistochemical staining.
A second embodiment of the present invention involves a subset of invasive BC that includes the presence of lymphocytic infiltration (LI) and exhibits amplification of the HER2 gene (HER2+ BC). Most HER2+ BC is currently treated with agents that specifically target the HER2 protein. Researchers have shown that the presence of LI in histopathology is a viable prognostic indicator for various cancers, including HER2+ BC [13]-[15]. The function of LI as a potential antitumor mechanism in BC was first shown by Aaltomaa et al. [14]. More recently, Alexe et al. [15] demonstrated a correlation between the presence of high levels of LI and tumor recurrence in early stage HER2+ BC. Pathologists do not routinely report on the presence of LI, especially in HER2+ BC. A possible reason for this is that pathologists currently lack the automated image analysis tools to accurately, efficiently, and reproducibly quantify the presence and degree of LI in BC histopathology.
While some researchers [9],[16]-[21] have recently begun to develop computer-aided diagnosis (CADx) system and methods for the analysis of digitized BC histopathology, they have mostly focused on either finding suspicious regions of interest (ROI) or have attempted to determine cancer grade from manually isolated ROIs. The methods for both applications use image-based features to discriminate between 2 classes: either normal and benign regions or low and high grade ROIs. Specifically, the size and shape of cancer nuclei have been shown to distinguish low and high grade histology images [16], [9]. Textural features and filter banks have also been employed [16]-[19], [21] to model the phenotypic appearance of BC histopathology.
While several researchers have been developing algorithms for detection of nuclei [18], [23]-[29] in digitized histopathology, there have been no attempts to automatically detect or quantify extent of LI on BC histopathology. Some popular approaches to automated nuclear detection are based on adaptive thresholding [18], [23] and fuzzy c-means clustering [25], [27]. These techniques rely on differences in staining to distinguish nuclei from surrounding tissue. However, they are not appropriate for the task of LI detection due to the similarity in appearance between BC and lymphocyte nuclei (FIG. 4(a)). Techniques such as active contours [24], [28], [29] have utilized gradient (edge) information to automatically isolate nuclei in histological images. These methods, however, might be limited in their ability to handle variations in the appearance of BC nuclei (FIGS. 4(b), (c)) and image acquisition artifacts (FIGS. 4(e), (f)). Some researchers have developed hybrid techniques in order to improve nuclear detection and segmentation results. For example, Glotsos et. al. [28] used Support Vector Machine clustering to improve initialization for active contour models. More recently, semi-automated probabilistic models have used pixel-wise intensity information to detect cancer [26] and lymphocyte nuclei [30] in digitized BC histopathology. Probabilistic models, however, are usually limited by the availability of expert-annotated training data.
Detection of lymphocytes alone, however, cannot completely characterize the abnormal LI phenotype because a baseline level of lymphocytes is present in all tissues. Gunduz et al. [20] explored automated cancer diagnosis by using hierarchical graphs to model tissue architecture, whereby a graph is defined as a set of vertices (nuclei) with corresponding edges connecting all nuclei.