This invention relates generally to the use of spectral data for categorizing materials. More particularly, the invention relates to the use of spectral data for classifying tissue by discriminant analysis.
1. Field of the Invention
This invention relates generally to the use of spectral data for categorizing materials. More particularly, the invention relates to the use of spectral data for classifying tissue by discriminant analysis.
2. Background of the Invention
Information obtained from spectral analysis is useful in such diverse fields as geology, chemistry, medicine, physics and biology. For example, it is known that cells and tissues emit characteristic spectra in response to light stimulation. The nature of those spectra is indicative of the health of the tissue. Thus, a cancerous tissue has associated with it an emission spectrum that is different than the emission spectrum of a corresponding healthy tissue. Similarly, emission spectra are used in attempts to differentiate between the mineral content of two or more geologic samples.
In any spectral analysis of a sample, the condition of which is unknown, an attempt is made to associate the sample with a reference sample that represents a known state. Commonly, spectral values such as amplitude measured in response to excitation, are used to place a test sample on a range of spectral values obtained from samples of known condition. The object is to identify a best fit between the test sample and a sample the condition of which is known. While such an analysis is straightforward in principle, it is often the case that a spectrum recorded from a test specimen does not match any known spectrum, or appears to have features present in more than one known spectrum. In addition, there are often multiple features of interest or of consequence in a single spectrum, which makes any comparison problematic. Thus, it is not always clear where a test sample should be placed with a range of known samples. the result is that the spectra of many samples cannot be assigned a condition by comparison to known samples without significant ambiguity.
In spectral analysis, different types of spectra provide distinct and useful kinds of information. In conventional spectroscopic analysis, one may employ such diverse methods and techniques as optical spectra spanning the infrared, visible and ultraviolet based on the interactions of materials with light (or electromagnetic radiation), spectroscopic methods based on electromagnetic resonance, such as electron spin resonance (esr) and nuclear magnetic resonance (nmr) spectroscopy, x-ray spectroscopic methods based on crystal structures, Moessbauer spectroscopy based on nuclear resonance, and Raman spectroscopy based on molecular dynamics. Often, to characterize a specimen as completely as possible, all of the above methods are employed, irrespective of the results of a particular method of analysis, and the totality of the results obtained are used to analyze or describe the specimen being examined. In the present inventive systems and methods, a first spectral result is used to determine whether a specimen being tested can be categorized, and if the initial result warrants further analysis, as when a categorization cannot be made in a definitive manner, a further specific analysis is performed in an attempt to obtain suitable categorization information. In some instances, it is impossible to reach a definitive conclusion.
The invention provides methods of comparing spectral data from a test sample with spectral data corresponding to known conditions. In particular, the invention provides methods for assigning at least one condition to a test sample based upon a relationship between at least two metrics, each representative of the similarity of the test sample to samples having known conditions, based upon a comparison of spectral data from the test sample to a constellation of reference data from the samples having known conditions. For a known condition, one can record and analyze reference spectra from a number of specimens having the known condition to provide a number of reference data points characteristic of that known condition. N-dimensional data points obtained from reference spectra for a plurality of specimens exhibiting the same condition tend to fall in a cluster, or constellation.
In one aspect, the invention provides a method for determining a condition of a test specimen. A preferred method comprises obtaining at least one optical spectrum from a test specimen, the optical spectrum being characterized by (N) quantitative features, each of the (N) quantitative features corresponding to one of (N) pre-selected wavelengths, wherein (N) is an integer greater than 1. Preferred methods further comprise determining a first metric between a point in N-dimensional space corresponding to the (N) quantitative features of the optical spectrum from the test specimen and a point in N-dimensional space characteristic of a first constellation of reference data that includes a first plurality of points in N-dimensional space, each of which corresponds to (N) quantitative features of a reference optical spectrum at the (N) pre-selected wavelengths, the first constellation of reference data being representative of a first known condition. Preferred methods further comprise determining a second metric between the point in N-dimensional space corresponding to the (N) quantitative features of the optical spectrum from the test specimen and a point in N-dimensional space characteristic of a second constellation of reference data that includes a second plurality of points in N-dimensional space, each of which corresponds to (N) quantitative features of a reference optical spectrum at the (N) pre-selected wavelengths, the second constellation of reference data being representative of a second known condition. Preferred methods further comprise assigning one of the first and the second conditions to the test specimen, based at least in part on a relationship between the first and the second metrics.
In one embodiment, the first metric and the second metric are each selected from the group consisting of a square root of a sum of the squares of the differences in coordinates of points in the N-dimensional space, a Mahalanobis distance, a Bhattacharyya distance, and a probability.
In another embodiment, methods of the invention further comprise analyzing biological tissue. Preferred methods comprise obtaining at least one optical spectrum from a tissue sample obtained form a patient, and determining a first metric by comparing optical features of the spectrum obtained from the tissue sample to a first constellation of data points representing spectral characteristics of a first condition; determining a second metric by comparing optical features of the spectrum obtained from the tissue sample to a second constellation of data points representing spectral characteristics of a second condition; and assigning one of the two conditions to the test sample based upon a relationship between the first and second metrics. In a preferred embodiment, the optical spectra obtained are selected from the group consisting of a fluorescence spectrum, a reflectance spectrum and a Raman spectrum. In one embodiment, the tissue specimen is human cervical tissue or a sample prepared from human cervical tissue. In one embodiment, methods of the invention are conducted in vivo. In a preferred embodiment, the first condition is normal health and the second condition is selected from the group consisting of moderate cervical intraepithelial neoplasia (CIN II) and severe cervical intraepithelial neoplasia (CIN III).
In one embodiment, the method further includes reporting that the optical spectrum is inconclusive with regard to the first known condition and the second known condition in response to each of the first and the second metrics exceeding a pre-determined metric value. In one embodiment, at least one of the points in N-dimensional space characteristic of a first constellation of reference data and the point in N-dimensional space characteristic of a second constellation of reference data is selected from the group consisting of a centroid of a respective one of the constellations of reference data and a weighted average of a respective one of the constellations of reference data. In one embodiment, the quantitative features are selected from the group consisting an amplitude, an average of a plurality of amplitudes, a function of a plurality of amplitudes, an average slope, a derivative, and an integral.
In another aspect, the invention provides a method of determining that a condition of a test specimen belongs to one of a plurality of known conditions, each of the plurality of conditions represented by a reference specimen constellation corresponding to a selected member of a plurality of known conditions. Preferred methods comprise obtaining at least one optical spectrum from a test specimen, and selecting (N) quantitative features from the spectrum, each of the (N) quantitative features being associated with one of (N) pre-selected wavelengths, wherein (N) is an integer greater than 1. Such methods further comprise determining a plurality of metric values, each metric value corresponding to a measure between a point in the N-dimensional space corresponding to the (N) quantitative features of the optical spectrum from the test spectrum and a point in N-dimensional space characteristic of a constellation of reference data representing one of a plurality of known conditions, each the constellations including a plurality of points in the N-dimensional space. Finally, such methods comprise determining a condition to be ascribed to the test specimen based at least in part on the relation between the plurality of metric values.
In one embodiment, at least one of the plurality of metrics described above is a distance, the distance being calculated based on a square root of a sum of the squares of the differences in the corresponding coordinates in the N-dimensional space, a Mahalanobis distance, or a Bhattacharyya distance. In one embodiment, at least one of the plurality of metrics is a probability.
In one embodiment, methods of the invention further comprise obtaining at least one optical spectrum from a biological sample obtained from a patient, and deducing one of a plurality of states of health to be ascribed to the sample based up a fluorescence spectrum, a reflectance spectrum or a Raman spectrum. In one embodiment, the sample is human cervical tissue. Also in a preferred embodiment, methods of the invention are conducted on cervical tissue in vivo. In the analysis of cervical tissue, the conditions used to assign a condition to the test sample are selected from the group consisting of normal squamous epithelium, columnar epithelium, immature metaplasia, mature metaplasia, Nabothian cysts, crypt openings, traumatically eroded tissue, benign polyps, mild cervical intraepithelial neoplasia (CIN I), moderate cervical intraepithelial neoplasia (CIN II), severe cervical intraepithelial neoplasia (CIN III) and cancer.
In one embodiment, the method further includes reporting that the optical spectrum is inconclusive with regard to the plurality of known conditions in response to each of the plurality of metrics exceeding a pre-determined metric value. In one embodiment, at least one of the points in N-dimensional space characteristic of a constellation of reference data representing one of a plurality of known conditions is selected from the group consisting of a centroid of a respective one of the constellations of reference data and a weighted average of a respective one of the constellation of reference data. In one embodiment, the quantitative features are selected from the group consisting of an amplitude, an average of a plurality of amplitudes, a function of a plurality of amplitudes, an average slope, a derivative, and an integral.
In still another aspect, the invention relates to a system for determining a condition of a test specimen. The system comprises a computer that receives data characteristic of at leas one optical spectrum recorded from a test specimen, the data comprising (N) quantitative features of the optical spectrum, each of the (N) quantitative features corresponding to one of (N) pre-selected wavelengths, wherein (N) is an integer greater than 1. The system further comprises a first memory in communication with the computer and containing a first constellation of reference data comprising a first plurality of points in N-dimensional space, each of the first plurality of points corresponding to (N) quantitative features of a reference optical spectrum at the (N) pre-selected wavelengths, the first constellation of reference data being representative of a first known condition. The system further comprised a second memory in communication with the computer that contains a second constellation of reference data comprising a second plurality of points in N-dimensional space, each of the second plurality of points corresponding to (N) quantitative features of a reference optical spectrum at the (N) pre-selected wavelengths, the second constellation of reference data being representative of a second known condition. The computer is adapted to determine first metric between a point in N-dimensional space corresponding to the (N) quantitative features of the optical spectrum from the test specimen and a point in N-dimensional space characteristic of the first constellation of reference data. The computer is further adopted to determine a second metric between the point in N-dimensional space corresponding to the (N) quantitative features of the optical spectrum from the test specimen and a point in N-dimensional space characteristic of the second constellation of reference data. Finally, the computer is adapted to assign one of the first and the second condition to the test specimen, based at least in part on a relationship between the first and the second metric values.
In one embodiment, the first distance and the second metrics each are selected from the group consisting of a square root of a sum of the squares of the differences in coordinated of points in the N-dimensional space, a Mahalanobis distance, and a Bhattacharyya distance. In one embodiment, the test specimen is a tissue sample and the first and the second conditions correspond to states of health. In one embodiment, the tissue sample is human cervical tissue, the first state of health and the second state of health are selected from normal health including normal squamous epithelium, columnar epithelium, immature metaplasia, mature metaplasia, Nabothian cysts, crypt opening, traumatically eroded tissue, benign polyps, mild cervical intraepithelial neoplasia (CIN I), moderated cervical intraepithelial neoplasia (CIN II), severe cervical intraepithelial neoplasia (CIN III) and cancer. In one embodiment, the computer is further adapted to report that the data characteristic of the optical spectrum recorded from the test specimen is inconclusive as regards the first known condition and the second known condition in response to the first and the second distances both exceeding a pre-determined metric. In another embodiment, the computer is further adapted to computer a centroid of a constellation of reference data comprising a point in N-dimensional space characteristic of a first constellation of reference data or a point in N-dimensional space characteristic of a second constellation of reference data. Preferably, the at least one optical spectrum recorded from a test specimen including an optical spectrum selected from a fluorescence spectrum, a reflectance spectrum and a Raman spectrum. In one embodiment, the point in N-dimensional space characteristic of a constellation of reference data representing one of a plurality of known conditions is selected from the group consisting of a centroid of the constellations of reference data and a weighted average of the constellation of reference data. In one embodiment, the test specimen is human tissue. In one embodiment, the first and the second known conditions are selected from the group consisting of healthy tissue, inflammation of tissue, atrophic tissue, hypoxic tissue, diseased tissue, and cancerous tissue.
The foregoing and other objects, aspects, features, and advantages of the invention will become more apparent from the following description and from the claims.