1. Field of the Invention
The invention generally relates to cancer detection and classification. The present disclosure includes quantitative analysis of and predictive modeling from digitized histopathology images. This disclosure includes methods that integrate multiple image description methods and classification methods into robust computer systems for accurate detection and classification or grading of cancerous tissue samples.
2. Description of the Relevant Art
The microscopic analysis of histopathology samples is widely used as a cancer diagnosis method. Histopathology is a labor-intensive process requiring the preparation of biopsy material including fixation, freezing, sectioning and staining followed by analysis and interpretation. Together these process steps have traditionally led to turnaround times of days rather than hours. Furthermore, tissue assessment is routinely performed by human visual evaluation of the structural deviation of cellular and sub-cellular components to diagnose the malignancy and aggressiveness of various types of cancer. These judgments may not be reproducible due factors including the pathologist's skill level, fatigue, and experience. Moreover, there may be variability in the interpretation and application of the grading criteria, particularly when grading or scoring complicated tissue samples. For example, in the case of prostate cancer diagnosis, it has been shown that intra-observer and inter-observer reproducibility of the Gleason grading system ranges from 60% to 90%. Additionally, comparisons between the assigned grade to a needle biopsy and the grade of the corresponding prostate gland reflect under-grading of the biopsy in 42% of cases and over-grading in 15% of cases.
Surgical oncologists require precise information about the findings on biopsy samples, including prognostic indicators such as tumor grade, score, localization, and extent. In recent years, standards for the accuracy of this information have been increasing, thus creating additional work required for clinicians. These issues create the need for a fast, automated, and reproducible classification system based on quantitative characterization of histopathology images. Such a system would likely result in significantly improved accuracy in cancer diagnosis.
The inherent complexity and non-homogeneity of histopathology images make quantitative analysis and classification a challenging task. Histopathology images vary widely but a share some common characteristics. A few colors, defined by an applied stain, are present in each tissue sample. These colors highlight specific tissue structures; particular edges, and a various of textures. Objects such as glands and nuclear structures may randomly appear in the image with varying sizes and shapes.
Due to the importance of imaging in cancer diagnosis and treatment and the advancement in digital pathology, computer-aided diagnosis has become an active research area in medical imaging and diagnostic radiology and pathology. The goal of histopathology-based computer-aided diagnosis is to increase the productivity of pathologists by improving the accuracy and consistency of diagnoses, reducing image reading time, and providing computer-based tools for image visualization and annotation.
Several systems for cancer detection and grading have been developed in the past several years. While most researchers have focused on the development of segmentation and feature extraction methods, the present disclosure includes robust classification using biopsy-image normalization methods in conjunction with multi-classifier ensembles. Most existing works compare the performance of various classifiers on a feature set and select the one that exhibits the best performance in terms of accuracy or computational complexity.
The present invention further provides improved classification methods and a novel method of creating multi-classifier ensembles. Most existing classification methods are based upon real numbers, and therefore process images in monochrome. When processing color image data, the standard technique is to process each color channel separately and to recombine the image after processing. However, using this method, correlations between color channels are lost. Since humans perceive thousands of colors but only ten to fifteen levels of grayscale, present classification methods do not necessarily provide optimal performance.
One potential solution is the use of quaternion numbers for the representation of image data. Since each quaternion value is in the four-dimensional Hamiltonian space, three- and four-dimensional color data naturally lends itself to representation with quaternions. Using quaternion signal processing, all color channels may be processed simultaneously, potentially providing improved processing results.
Real-valued neural networks and the associated backpropagation training algorithms, independently introduced as a multi-layer perceptron structure by a variety of researchers, have been used for classification in many applications. However, real-valued neural networks process images in monochrome, therefore leading to the aforementioned problems with recombining the color planes. Therefore, a quaternion neural network may improve classification performance when utilizing color images.
A major drawback to quaternion neural networks is their computational cost: The multiplication of two quaternions requires sixteen real multiplications, instead of one multiplication for real numbers. Therefore a high-speed training algorithm for quaternion neural networks is needed to render a quaternion neural network practical and one is disclosed herein.
Additional background art includes those systems proposed in the following patents or patent applications:
U.S. Pat. No. 7,761,240 B2, authored by Saidi et al., discloses systems and methods for automated diagnosis and grading of tissue images based on morphometric data extracted from the images by a computer in a 2-step procedure (detection and grading). These systems have not explored the integration among the color channels, which could improve the features extraction steps.
U.S. Pat. No. 8,139,831 B2, authored by Khamene et al., provides a method for unsupervised classification of histological images obtained from a slide simultaneously co-stained with NIR fluorescent and H&E stains. In this system, only gland centroids are used for graph construction. Some other important structures such as cell nuclei were not explored as a nodes of Voronoi, Delaunay and MST graphs. A significant drawback of this method is that additional an NIR stain must be used; the most widespread in clinical practice is H&E stain.
U.S. Pat. App No. 2010/0098306 A1, authored by Madabhushi et al., relates to computer-aided diagnosis using content-based retrieval of histopathological (prostate/breast/renal) image features based on predetermine criteria and their analysis for malignancy determination. The maximum reported accuracy is substantially worse than that obtained by the embodiments disclosed herein.
U.S. Pat. App No. 2011/0243417 A1, authored by Madabhushi et al., discloses a method and a system for detecting biologically relevant structures in a hierarchical fashion, beginning at a low-resolution. The relevant structures are classified using pairwise Markov models. The maximum reported accuracy is substantially worse than that obtained by the embodiments disclosed herein.
U.S. Pat. App No. 2007/0019854 A1, authored by Gholap et al., discloses a method and a system that provides automated screening of prostate needle biopsy specimen in a digital image of prostatectomy specimen. Among other differences, the segmentation methodology taught by Gholap differs significantly from the present disclosure.
U.S. Pat. App No 2010/0312072, authored by Breskin et al., discloses a method of estimating the grade of a prostate cancer sample using zinc data associated with the prostate. The assessment of cancer severity is not made from histological quantitative analysis of tissue images.
U.S. Pat. US 7,899,625 B2, authored by Bhanot et al., provides a classification method for cancer detection from mass spectrometry data. This system differs from the systems disclosed in the present invention in various aspects: the data patterns come from a different domain, therefore all pre-processing and feature extraction methods are different. In addition, the multi-classifier ensemble scheme does not contain grading classifiers as disclosed herein, and the attendant advantages, including a reduction in the complexity of combination algorithms.
U.S. Pat. App No US 20090297007 A1, authored by to Cosatto et al., discloses systems and methods for analyzing biopsy images based solely on nuclear information. A drawback of the system is that it is not able to grade or score a detected cancer region.
U.S. Pat. App No WO 2009006696 Al authored by Louis Dagenais and Samantha Gallagher discloses systems and methods for processing for visualization and analysis of anatomical pathology images. More specifically, the Dagenais and Gallagher invention provides a method for digital pathology comprising the steps of: (i) obtaining at least one spectrally distinct digital images of the tissue sample; and (ii) preparing a tissue map from said at least one image in order to characterize tissue features and thereby classify tissue type and condition. Unfortunately, while this level of automation has improved the physical handling of pathology images, it has provided no concrete assistance to pathologists such that the accuracy of diagnosis needs to be improved [WO 2009006696 A1].
The systems and methods disclosed in the present invention offer several advantages in terms of accuracy and generalization ability over previously developed systems. The diversity of classifiers and the ensemble schemes described herein makes the systems suitable for application to different cancer recognition problems, without constrains related to color variations, scale, image magnification, feature vectors, and so on. In addition, the use of multi-classifier systems favors the use of specialized sub-systems based on a group of features or tissue descriptors. Moreover, the disclosed computer-aided diagnosis systems have been designed to process particular biopsy regions of interest or whole-slides of tissue samples from several body tissues. The methods and systems disclosed are completely general and independent of the process under consideration.