Field of the Subject Disclosure
The present subject disclosure relates to a computer-implemented method and/or digital pathology enabled machine learning system for predicting the risk of cancer recurrence in early stage breast cancer patients. More particularly, the present subject disclosure relates to predicting breast cancer recurrence risk directly from a set of image features computed from digitized immunohistopathology (for example, H&E, IHC) tissue slides.
Background of the Subject Disclosure
Prognosis of hormone-positive early-stage breast cancer patients offers the opportunity to make more informed follow-up choices, for example the addition of adjuvant chemotherapy. Traditionally, pathologists have prognosticated these cancers using conventional staging, tumor proliferation index, and a small set of morphological features (gland formation, nuclear grade, and mitosis) that are manually scored from H&E slides. Alternatively, in some prior art methods, image-features computed directly from the H&E slide only are utilized to train a machine learning module to build a prognostic model to predict overall survival in breast cancer patients. For further breast cancer subtyping and prognostic and predictive evaluation, protein expression of the tumor is evaluated by analyzing the patient immunohistochemical (IHC) tissue slides. Hormone-positive status of the cancer is determined from interpreting the estrogen receptor (ER) and progesterone receptor (PR) marker slides. Tumor proliferation and aggressive nature of the tumor is determined from Ki67 biomarker tissue slide. The patient's suitability to targeted therapy (such as Herceptin) is determined by analyzing the HER2 IHC tissue slide. The information inferred from the H&E tissue slide, along with the immunohistochemical (IHC) protein expression of the tumor, such as estrogen receptors (ER), progesterone receptors (PR), HER2, and Ki67 markers, given as IHC marker slide-level quantitative scores such as either (marker) percent positivity or H-score, may be used to prognosticate a patient.
The classification of breast cancer based on Estrogen receptor (ER) and HER2 testing is as such known from the prior art, cf. http://www.cancer.org/cancer/breastcancer/detailedguide/breast-cancer-classifying.
Prior art methods for tissue slide based diagnostics such as those disclosed in Jack Cuzick, et al, Prognostic Value of a Combined Estrogen Receptor, Progesterone Receptor, Ki-67, and Human Epidermal Growth Factor Receptor 2 Immunohistochemical Score and Comparison With the Genomic Health Recurrence Score in Early Breast Cancer, JCO Nov. 10, 2011:4273-4278; published online on Oct. 11, 2011 at http://jco.ascopubs.org/content/29/32/4273.long. The cited reference discloses predicting breast cancer risk of recurrence using slide-level clinical scores computed from IHC slides (ER, PR, Ki67 and HER2). Individual IHC marker slides are scored (either manually or algorithmically) and slide level scores of percent positivity and H-score are computed. Using the slide level scores (percent positivity, H-scores) from the four markers, an IHC4 metric is computed. Based on the computed value of the IHC4 score/metric, the patient is categorized as belonging to low/high recurrence risk group.
While there is potentially a large amount of prognostic information for a given patient tissue sample, typically each sample-containing slide is analyzed separately from a clinical standpoint and summarized in few quantitative metrics such as percent positivity and H-score, without systematic evaluation and holistic integration of all the information contained in the tissue into a single comparative prognostic.