The present invention in the technical field of processing of images (e.g., tissue images) and feature extraction from the same for, for example, treating, diagnosing, and/or predicting the occurrence (e.g., recurrence) of one or more medical conditions (e.g., cancer or other types of disease).
Conventional approaches to analyze tissue images have been prone to misclassification of objects in tissue and may produce incorrect results or misdetections. These problems are exacerbated by the inherent tissue heterogeneity and potential variations in image acquisition conditions, imprecise labeling, and image artifacts.
The availability of tissue images processed with specific procedures to emphasize certain characteristics has allowed computerized methods to be applied to tissue imaging. Immunohistochemistry (IHC) staining further enables the evidencing, through multicolor visualization, of target protein expression in cells of human tissue. It is used to identify patients most likely to respond to targeted therapy. Currently, IHC image analysis focuses on the staining intensity, performed mostly in a manual and thus low throughput, labor intensive, and subjective way. Emerging computational techniques use metrics like the H-score, or the Aperio metric. Recent studies, however, show that to tailor a patient's treatment and to monitor treatment progression, finer granularity grading is necessary. Thus, the analysis needs to go beyond staining intensity and take into account the morphological and cellular architectures that continue to define cancer and many diseases.
Existing machine learning approaches have faced several challenges. First, there is a high degree of heterogeneity. FIG. 1A, FIG. 1B, FIG. 1C, and FIG. 1D show several tissue samples exemplifying the heterogeneity. This heterogeneity can occur inter- and intra-tissue samples. This heterogeneity can also occur inter- and intra-procedures. Second a mix of local and global features has created challenges. Third, the images sizes are typically large (often 3 to 5 orders of magnitude larger than radiology images). Finally, labeling in the images can be imprecise. The global label of the image may be incorrect, or the label might not be representative of all regions of the image.
More accurate, reliable, and repeatable systems and methods for representation, and feature extraction from tissue images are needed, for example, to allow for more in depth disease progression understanding, and the generation of improved predictive models for diagnosing, treating, and/or predicting the occurrence of medical conditions. Furthermore, a robust apparatus and method for extracting small, but representative image characteristics is needed.