Pathology imaging is one of the last fields in medical imaging yet to be digitized. Compared to other well-developed medical imaging modalities, such as Computed Tomography (CT) and Magnetic Resonance Imaging (MRI), digitized pathology images are characterized by super-high image resolution, non-uniform texture patterns, and densely-structured segments. In addition, the diversity of cancer types leads to constantly-changing image patterns, which makes it even more challenging to develop fully-automatic image classification algorithms.
Digitized pathology images are created from tissue samples stained with different methods for different diagnosing purposes, such as H&E (hematoxylin and eosin) and IHC (immunohistochemical) staining. Both of these staining methods are widely used in pathology, and H&E staining is particularly common for use in biopsy of suspected cancerous tissue.
Conventional pathology image analysis methods utilize human labor to individually examine and label the stained pathology images. This practice requires a great deal of human labor, is time consuming, and is subject to the subjectivity of the pathologist.
To date, digitalization of pathology image analysis has seen only small amounts of development. Some conventional techniques for analyzing digital pathology images involve classifying each digital pixel according to multiple features. Each of these multiple features has multiple dimensions. These multiple features are then concatenated to yield a high-dimensional data set which describes each pixel. The high-dimensional data set thus produced is then analyzed by a single layer model to produce a final classification recognition score for each analyzed pixel. Because each pixel may be described by hundreds of dimensions, in an image containing millions of pixels, the quantity of data rapidly becomes difficult or impossible to process. The requirement of a computer to keep all of the features in memory at once leads to delays in processing and high memory requirements. Conventional training techniques may take a long time and, because of processor requirements, may only use small subsets of training data to train the models. Conventional classification techniques also have the drawback of being unable to be calibrated to individual images.
It is therefore desirable to provide a faster and more efficient hierarchical image recognition framework. By reducing computing power and memory requirements, larger portions of training data may be used to train the hierarchical image recognition models proposed herein. In addition, image classification may be performed faster than conventional techniques, which permits real-time model calibration to best classify unique individual images.
The multi-layer nature of the proposed hierarchical image recognition framework allows various features of pixels to be classified separately. In certain embodiments, it further prevents the classification of one feature from influencing the classification of another.