Various pathological methods are used to analyze biological specimens for the detection of abnormal or cancerous cells. For example, standard histopathology involves visual analysis of stained tissue sections by a pathologist using a microscope. Typically, tissue sections are removed from a patient by biopsy, and the samples are either snap frozen and sectioned using a cryo-microtome, or they are formalin-fixed, paraffin embedded, and sectioned via a microtome. The tissue sections are then mounted onto a suitable substrate. Paraffin-embedded tissue sections are subsequently deparaffinized. The tissue sections are stained using, for example, an hemotoxylin-eosin (H&E) stain and are coverslipped.
The tissue samples are then visually inspected at a high resolution visual inspection, for example, 10× to 40× magnification. The magnified cells are compared with visual databases in the pathologist's memory. Visual analysis of a stained tissue section by a pathologist involves scrutinizing features such as nuclear and cellular morphology, tissue architecture, staining patterns, and the infiltration of immune response cells to detect the presence of abnormal or cancerous cells.
If early metastases or small clusters of cancerous cells measuring from less than 0.2 to 2 mm in size, known as micrometastases, are suspected, adjacent tissue sections may be stained with an immuno-histochemical (IHC) agent/counter stain such as cytokeratin-specific stains. Such methods increase the sensitivity of histopathology since normal tissue, such as lymph node tissue, does not respond to these stains. Thus, the contrast between unaffected and diseased tissue can be enhanced.
The primary method for detecting micrometastases has been standard histopathology. The detection of micrometastases in lymph nodes, for example, by standard histopathology is a formidable task owing to the small size and lack of distinguishing features of the abnormality within the tissue of a lymph node. Yet, the detection of these micrometastases is of prime importance to stage the spread of disease because if a lymph node is found to be free of metastatic cells, the spread of cancer may be contained. On the other hand, a false negative diagnosis resulting from a missed micrometastasis in a lymph node presents too optimistic a diagnosis, and a more aggressive treatment should have been recommended.
Although standard histopathology is well-established for diagnosing advanced diseases, it has numerous disadvantages. In particular, variations in the independent diagnoses of the same tissue section by different pathologists are common because the diagnosis and grading of disease by this method is based on a comparison of the specimen of interest with a database in the pathologist's memory, which is inherently subjective. Differences in diagnoses particularly arise when diagnosing rare cancers or in the very early stages of disease. In addition, standard histopathology is time consuming, costly and relies on the human eye for detection, which makes the results hard to reproduce. Further, operator fatigue and varied levels of expertise of the pathologist may impact a diagnosis.
In addition, if a tumor is poorly differentiated, many immunohistochemical stains may be required to help differentiate the cancer type. Such staining may be performed on multiple parallel cell blocks. This staining process may be prohibitively expensive and cellular samples may only provide a few diagnostic cells in a single cell block.
To overcome the variability in diagnoses by standard histopathology, which relies primarily on cell morphology and tissue architectural features, spectroscopic methods have been used to capture a snapshot of the biochemical composition of cells and tissue. This makes it possible to detect variations in the biochemical composition of a biological specimen caused by a variety of conditions and diseases. By subjecting a tissue or cellular sample to spectroscopy, variations in the chemical composition in portions of the sample may be detected, which may indicate the presence of abnormal or cancerous cells. The application of spectroscopy to infrared cytopathology (the study of diseases of cells) is referred to as “spectral cytopathology” (SCP), and the application of infrared spectroscopy to histopathology (the study of diseases of tissue) as “spectral histopathology” (SHP).
SCP on individual urinary tract and cultured cells is discussed in B. Bird et al., Vibr. Spectrosc., 48, 10 (2008) and M. Romeo et al., Biochim Biophys Acta, 1758, 915 (2006). SCP based on imaging data sets and applied to oral mucosa and cervical cells is discussed in WO 2009/146425. Demonstration of disease progression via SCP in oral mucosal cells is discussed in K. Papamarkakis et al., Laboratory Investigations, 90, 589 (2010). Demonstration of sensitivity of SCP to detect cancer field effects and sensitivity to viral infection in cervical cells is discussed in K. Papamarkakis et al., Laboratory Investigations, 90, 589, (2010).
Demonstration of first unsupervised imaging of tissue using SHP of liver tissue via hierarchical cluster analysis (HCA) is discussed in M. Diem et al., Biopolymers, 57, 282 (2000). Detection of metastatic cancer in lymph nodes is discussed in M. J. Romeo et al., Vibrational Spectrosc., 38, 115 (2005) and M. Romeo et al., Vibrational Microspectroscopy of Cells and Tissues, Wiley-Interscience, Hoboken, N.J. (2008). Use of neural networks, trained on HCA-derived data, to diagnose cancer in colon tissue is discussed in P. Lasch et al., J. Chemometrics, 20, 209 (2007). Detection of micro-metastases and individual metastatic cancer cells in lymph nodes is discussed in B. Bird et al., The Analyst, 134, 1067 (2009), B. Bird et al., BMC J. Clin. Pathology, 8, 1 (2008), and B. Bird et al., Tech. Cancer Res. Treatment, 10, 135 (2011).
Spectroscopic methods are advantageous in that they alert a pathologist to slight changes in chemical composition in a biological sample, which may indicate an early stage of disease. In contrast, morphological changes in tissue evident from standard histopathology take longer to manifest, making early detection of disease more difficult. Additionally, spectroscopy allows a pathologist to review a larger sample of tissue or cellular material in a shorter amount of time than it would take the pathologist to visually inspect the same sample. Further, spectroscopy relies on instrument-based measurements that are objective, digitally recorded and stored, reproducible, and amenable to mathematical/statistical analysis. Thus, results derived from spectroscopic methods are more accurate and precise then those derived from standard histopathological methods.
Various techniques may be used to obtain spectral data. For example, Raman spectroscopy, which assesses the molecular vibrations of a system using a scattering effect, may be used to analyze a cellular or tissue sample. This method is described in N. Stone et al., Vibrational Spectroscopy for Medical Diagnosis, J. Wiley & Sons (2008), and C. Krafft, et al., Vibrational Spectrosc. (2011).
Raman's scattering effect is considered to be weak in that only about 1 in 1010 incident photons undergoes Raman scattering. Accordingly, Raman spectroscopy works best using a tightly focused visible or near-IR laser beam for excitation. This, in turn, dictates the spot from which spectral information is being collected. This spot size may range from about 0.3 μm to 2 μm in size, depending on the numerical aperture of the microscope objective, and the wavelength of the laser utilized. This small spot size precludes data collection of large tissue sections, since a data set could contain millions of spectra and would require long data acquisition times. Thus, SHP using Raman spectroscopy requires the operator to select small areas of interest. This approach negates the advantages of spectral imaging, such as the unbiased analysis of large areas of tissue.
SHP using infrared spectroscopy has also been used to detect abnormalities in tissue, including, but not limited to brain, lung, oral mucosa, cervical mucosa, thyroid, colon, skin, breast, esophageal, prostate, and lymph nodes. Infrared spectroscopy, like Raman spectroscopy, is based on molecular vibrations, but is an absorption effect, and between 1% and 50% of incident infrared photons are likely to be absorbed if certain criteria are fulfilled. As a result, data can be acquired by infrared spectroscopy more rapidly with excellent spectral quality compared to Raman spectroscopy. In addition, infrared spectroscopy is extremely sensitive in detecting small compositional changes in tissue. Thus, SHP using infrared spectroscopy is particularly advantageous in the diagnosis, treatment and prognosis of cancers such as breast cancer, which frequently remains undetected until metastases have formed, because it can easily detect micro-metastases. It can also detect small clusters of metastatic cancer cells as small as a few individual cells. Further, the spatial resolution achievable using infrared spectroscopy is comparable to the size of a human cell, and commercial instruments incorporating large infrared array detectors may collect tens of thousands of pixel spectra in a few minutes.
A method of SHP using infrared spectroscopy is described in Bird et al., “Spectral detection of micro-metastates in lymph node histo-pathology”, J. Biophoton. 2, No. 1-2, 37-46 (2009), (hereinafter “Bird”). This method utilizes infrared micro-spectroscopy (IRMSP) and multivariate analysis to pinpoint micro-metastases and individual metastatic cells in lymph nodes.
Bird studied raw hyperspectral imaging data sets including 25,600 spectra, each containing 1650 spectral intensity points between 700 and 4000 cm−1. These data sets, occupying about 400 MByte each, were imported and pre-processed. Data preprocessing included restriction of the wavenumber range to 900-1800 cm−1 and other processes. The “fingerprint” infrared spectral region was further divided into a “protein region” between 1700 and 1450 cm−1, which is dominated by the amide I and amide II vibrational bands of the peptide linkages of proteins. This region is highly sensitive to different protein secondary and tertiary structure and can be used to stage certain events in cell biology that depend on the abundance of different proteins. The lower wavenumber range, from 900 to 1350 cm−1, the “phosphate region”, contains several vibrations of the phosphodiester linkage found in phospholipids, as well as DNA and RNA.
In Bird, a minimum intensity criterion for the integrated amide I band was imposed to eliminate pixels with no tissue coverage. Then, vector normalization and conversion of the spectral vectors to second derivatives was performed. Subsequently, data sets were subjected individually to hierarchical cluster analysis (HCA) using the Euclidean distance to define spectral similarity and Ward's algorithm for clustering. Pixel cluster membership was converted to pseudo-color spectral images.
According to Bird's method, marks are placed on slides with a stained tissue section to highlight areas that correspond to areas on the unstained adjacent tissue section that are to be subjected to spectral analysis. The resulting spectral and visual images are matched by a user who aligns specific features on the spectral image and the visual image to physically overlay the spectral and visual images.
By Bird's method, corresponding sections of the spectral image and the visual image are examined to determine any correlation between the visual observations and the spectral data. In particular, abnormal or cancerous cells observed by a pathologist in the stained visual image may also be observed when examining a corresponding portion of the spectral image that overlays the stained visual image. Thus, the outlines of the patterns in the pseudo-color spectral image may correspond to known abnormal or cancerous cells in the stained visual image. Potentially abnormal or cancerous cells that were observed by a pathologist in a stained visual image may be used to verify the accuracy of the pseudo-color spectral image.
Bird's method, however, is inexact because it relies on the skill of the user to visually match specific marks on the spectral and visual images. This method is often imprecise. In addition, Bird's method allows the visual and spectral images to be matched by physically overlaying them, but does not join the data from the two images to each other. Since the images are merely physically overlaid, the superimposed images are not stored together for future analysis.
Further, since different adjacent sections of tissue are subjected to spectral and visual imaging, Bird's overlaid images do not display the same tissue section. This makes it difficult to match the spectral and visual images, since there may be differences in the morphology of the visual image and the color patterns in the spectral image.
Another problem with Bird's overlaying method is that the visual image is not in the same spatial domain as the infrared spectral image. Thus, the spatial resolution of Bird's visual image and spectral image are different. Typically, spatial resolution in the infrared image is less than the resolution of the visual image. To account for this difference in resolution, the data used in the infrared domain may be expanded by selecting a region around the visual point of interest and diagnosing the region, and not a single point. For every point in the visual image, there is a region in the infrared image that is greater than the point that must be input to achieve diagnostic output. This process of accounting for the resolution differences is not performed by Bird. Instead, Bird assumes that when selecting a point in the visual image, it is the same point of information in the spectral image through the overlay, and accordingly a diagnostic match is reported. While the images may visually be the same, they are not the same diagnostically.
To claim a diagnostic match, the spectral image used must be output from a supervised diagnostic algorithm that is trained to recognize the diagnostic signature of interest. Thus, the spectral image cluster will be limited by the algorithm classification scheme to driven by a biochemical classification to create a diagnostic match, and not a user-selectable match. By contrast, Bird merely used an “unsupervised” HCA image to compare to a “supervised” stained visual image to make a diagnosis. The HCA image identifies regions of common spectral features that have not yet been determined to be diagnostic, based on rules and limits assigned for clustering, including manually cutting the dendrogram until a boundary (geometric) match is visually accepted by the pathologist to outline a cancer region. This method merely provides a visual comparison.
Other methods based on the analysis of fluorescence data exist that are generally based on the distribution of an external tag, such as a stain or label, or utilize changes in the inherent fluorescence, also known as auto-fluorescence. These methods are generally less diagnostic, in terms of recognizing biochemical composition and changes in composition. In addition, these methods lack the fingerprint sensitivity of techniques of vibrational spectroscopy, such as Raman and infrared.
A general problem with spectral acquisition techniques is that an enormous amount of spectral data is collected when testing a biological sample. As a result, the process of analyzing the data becomes computationally complicated and time consuming. Spectral data often contains confounding spectral features that are frequently observed in microscopically acquired infrared spectra of cells and tissue, such as scattering and baseline artifacts. Thus, it is helpful to subject the spectral data to pre-processing to isolate the cellular material of interest, and to remove confounding spectral features.
One type of confounding spectral feature is Mie scattering, which is a sample morphology-dependent effect. This effect interferes with infrared absorption or reflection measurements if the sample is non-uniform and includes particles the size of approximately the wavelength of the light interrogating the sample. Mie scattering is manifested by broad, undulating scattering features, onto which the infrared absorption features are superimposed.
Mie scattering may also mediate the mixing of absorptive and reflective line shapes. In principle, pure absorptive line shapes are those corresponding to the frequency-dependence of the absorptivity, and are usually Gaussian, Lorentzian or mixtures of both. The absorption curves correspond to the imaginary part of the complex refractive index. Reflective contributions correspond to the real part of the complex refractive index, and are dispersive in line shapes. The dispersive contributions may be obtained from absorptive line shapes by numeric KK-transform, or as the real part of the complex Fourier transform (FT).
Resonance Mie (RMie) features result from the mixing of absorptive and reflective band shapes, which occurs because the refractive index undergoes anomalous dispersion when the absorptivity goes through a maximum (i.e., over the profile of an absorption band). Mie scattering, or any other optical effect that depends on the refractive index, will mix the reflective and absorptive line shapes, causing a distortion of the band profile, and an apparent frequency shift.
FIG. 1 illustrates the contamination of absorption patterns by dispersive band shapes observed in both SCP and SHP. The bottom trace in FIG. 1 depicts a regular absorption spectrum of biological tissue, whereas the top trace shows a spectrum strongly contaminated by a dispersive component via the RMie effect. The spectral distortions appear independent of the chemical composition, but rather depend on the morphology of the sample. The resulting band intensity and frequency shifts aggravate spectral analysis to the point that uncontaminated and contaminated spectra are classified into different groups due to the presence of the band shifts. Broad, undulating background features are shown in FIG. 2. When superimposed on the infrared micro-spectroscopy (IR-MSP) patterns of cells, these features are attributed to Mie scattering by spherical particles, such as cellular nuclei, or spherical cells.
The appearance of dispersive line shapes in FIG. 1 superimposed on IR-MSP spectra was reported along with a theoretical analysis in M. Romeo, et al., Vibrational Spectroscopy, 38, 129 (2005) (hereinafter “Romeo 2005”). Romeo 2005 identifies the distorted band shapes as arising from the superposition of dispersive (reflective) components onto the absorption features of an infrared spectrum. These effects were attributed to incorrect phase correction of the instrument control software. In particular, the acquired raw interferogram in FTIR spectroscopy frequently is “chirped” or asymmetric, and needs to be symmetrized before FT. This is accomplished by collecting a double sided interferogram over a shorter interferometer stroke, and calculating a phase correction to yield a symmetric interferogram.
In Romeo 2005, it was assumed that this procedure was not functioning properly, which causes it to yield distorted spectral features. An attempt was made to correct the distorted spectral features by calculating the phase between the real and imaginary parts of the distorted spectra, and reconstructing a power spectrum from the phase corrected real and imaginary parts. Romeo 2005 also reported the fact that in each absorption band of an observed infrared spectrum, the refractive index undergoes anomalous dispersion. Under certain circumstances, various amounts of the dispersive line shapes can be superimposed, or mixed in, with the absorptive spectra.
The mathematical relationship between absorptive and reflective band shapes is given by the Kramers-Kronig (KK) transformation, which relates the two physical phenomena. The mixing of dispersive (reflective) and absorptive effects in the observed spectra was identified, and a method to correct the effect via a procedure called “Phase Correction” (PC) is discussed in Romeo 2005. Although the cause of the mixing of dispersive and absorptive contributions was erroneously attributed to instrument software malfunction, the principle of the confounding effect was properly identified. Due to the incomplete understanding of the underlying physics, however, the proposed correction method did not work properly.
P. Bassan et al., Analyst, 134, 1586 (2009) and P. Bassan et al., Analyst, 134, 1171 (2009) demonstrated that dispersive and absorptive effects may mix via the “Resonance Mie Scattering” (RMieS) effect. An algorithm and method to correct spectral distortion is described in P. Bassan et al., “Resonant Mie Scattering (RMieS) correction of infrared spectra from highly scattering biological samples”, Analyst, 135, 268-277 (2010). This method is an extension of the “Extended Multiplicative Signal Correction” (EMSC) method reported in A. Kohler et al., Appl. Spectrosc., 59, 707 (2005) and A. Kohler et al., Appl. Spectrosc., 62, 259 (2008).
This method removes the non-resonant Mie scattering from infrared spectral datasets by including reflective components obtained via KK-transform of pure absorption spectra into a multiple linear regression model. The method utilizes the raw dataset and a “reference” spectrum as inputs, where the reference spectrum is used both to calculate the reflective contribution, and as a normalization feature in the EMSC scaling. Since the reference spectrum is not known a priori, Bassan et al. use the mean spectrum of the entire dataset, or an “artificial” spectrum, such as the spectrum of a pure protein matrix, as a “seed” reference spectrum. After the first pass through the algorithm, each corrected spectrum may be used in an iterative approach to correct all spectra in the subsequent pass. Thus, a dataset of 1000 spectra will produce 1000 RMieS-EMSC corrected spectra, each of which will be used as an independent new reference spectrum for the next pass, requiring 1,000,000 correction runs. To carry out this algorithm, referred to as the “RMieS-EMSC” algorithm, to a stable level of corrected output spectra required a number of passes (˜10), and computation times that are measured in days.
Since the RMieS-EMSC algorithm requires hours or days of computation time, a fast, two-step method to perform the elimination of scattering and dispersive line shapes from spectra was developed, as discussed in B. Bird, M. Miljković and M. Diem, “Two step resonant Mie scattering correction of infrared micro-spectral data: human lymph node tissue”, J. Biophotonics, 3 (8-9) 597-608 (2010). This approach includes fitting multiple dispersive components, obtained from KK-transform of pure absorption spectra, as well as Mie scattering curves computed via the van Hulst equation (see H. C. Van De Hulst, Light Scattering by Small Particles, Dover, Mineola, N.Y., (1981)), to all the spectra in a dataset via a procedure known as Extended Multiplicative Signal Correction (EMSC) (see A. Kohler et al., Appl. Spectrosc., 62, 259 (2008)) and reconstructing all spectra without these confounding components.
This algorithm avoids the iterative approach used in the RMieS-EMSC algorithm by using uncontaminated reference spectra from the dataset. These uncontaminated reference spectra were found by carrying out a preliminary cluster analysis of the dataset and selecting the spectra with the highest amide I frequencies in each cluster as the “uncontaminated” spectra. The spectra were converted to pure reflective spectra via numeric KK transform and used as interference spectra, along with compressed Mie curves for RMieS correction as described above. This approach is fast, but only works well for datasets containing a few spectral classes.
In the case of spectral datasets containing many tissue types, however, the extraction of uncontaminated spectra can become tedious. Furthermore, under these conditions, it is unclear whether fitting all spectra in the dataset to the most appropriate interference spectrum is guaranteed. In addition, this algorithm requires reference spectra for correction, and works best with large datasets.
In light of the above, there remains a need for improved methods of analyzing biological specimens by spectral imaging to provide a medical diagnosis. Further, there is a need for an improved pre-processing method that is based on a revised phase correction approach, does not require input data, is computationally fast, and takes into account many types of confounding spectral contributions that are frequently observed in microscopically acquired infrared spectra of cells and tissue.