Field of the Invention
The present invention relates to systems and methods for classification of histology composition and delineation of cellular regions while remaining invariant to the batch effects via deep learning and sparse coding.
Description of the Related Art
Tissue sections are often stained with hematoxylin and eosin (H&E), which label DNA (e.g., nuclei) and protein contents, respectively, in various shades of color. They can provide a wealth of information about the tissue architecture (e.g., tumor). Even though there are inter- and intra-observer variations (Dalton et al, 2000), a trained pathologist always uses rich content (e.g., various cell types, cellular organization, cell state and health), in context, to characterize tumor architecture. At macro level, tissue composition (e.g., stroma versus tumor) can be quantified. At micro level, cellular features such as cell types, cell state, and cellular organization can be queried. Aberrations in the tissue architecture often reflect disease progression. However, outcome-based analysis requires a large cohort, and the performance of the existing techniques is hindered as a result of large technical and biological variations that are always present in such a cohort.
The current state of art relies on ad hoc models to (i) segment nuclear regions and (ii) classify distinct regions of histopathology. For example, intensity features may be used to identify cells or may use some sort of feature extraction from underlying local patches to classify distinct regions of histopathology. These techniques suffer from robustness as a result of the batch effect (e.g., technical variations in sample preparation) and biological heterogeneity. As a result, present techniques are not applicable to a large cohort of histology sections that are collected from different laboratories that do not adhere to an identical protocol. The significant of processing a large cohort of histology sections is that it will pave the way to develop new taxonomies for patient population and their response to therapies. The net effect is realization of personalized medicine from a simple histology sections.
Analysis of tumor histopathology is generally characterized into three categories of research (Gurcan et al, 2009); nuclear segmentation and multidimensional representation of tumor cells as an imaging biomarker; patch-based analysis and recruitment of lymphocytes. Currently, research is being conducted on analysis of whole slide imaging, tumor heterogeneity and composition, and integration with molecular data. Main strategies include fine tuning human engineered features and unsupervised feature learning. Fine tuning engineered features (FIG. 1) has been described by Chang et al, 2009, Han et al, 2011; Kong et al, 2010 and Kothari et al, 2012. Integration with molecular data has been described by Huang et al, 2011; Le et al, 2012; Nayak et al, 2013. Examples of unsupervised feature learning include Auto Encoder, which utilizes backpropagation to learn from unlabeled data (Mussa et al, 2005, Nelwamondo et al, 2007), Restricted Boltzman Machine (Hinton, 2006), Independent Space Analysis (Hyvärinen et al, 2009), and reconstruction independent subspace analysis (RISA) (V. Quoc, J. Han, J. Gray, P T Spellman, and B. Pavrin, IEEE ISBI 2012, 302-305). In addition, U.S. patent application Ser. No. 13/886,213, filed on May 2, 2013 relates to determining a prognosis or therapy for a patient by analyzing stained tissue samples.