There are three types of structural components of cells that become organized in certain ways to regulate cell architecture. These are actin filaments of approximately 8-nm diameter, intermediate filaments of approximately 10-nm diameter, and microtubules (MTs) of approximately 24-nm diameter. Larger macromolecular assemblages are composed from combinations of these smaller structures, which can either aggregate with components like themselves or with adaptor molecules or with other components which causes them to form an assemblage of cytoskeletal molecules. Such processes cause the assemblages to form a metastable cell structure that is easily within the resolving power of the light microscope. Moreover, organelles within the cells are themselves positioned by one or more processes that regulate trafficking along the cytoskeletal components.
Phase microscopic techniques are used to amplify the weak contrast inherent in cells and make visible macromolecular assemblages such as those described above. The phase principle relies upon a small increment of contrast being introduced when the material of the cells, which has an altered refractive index over the aqueous medium, shifts the phase of a light wave by about one-quarter. When the shifted waves are recombined with unshifted waves from the same optical source, cells and tissues can be visualized. However, phase microscopy has the drawback that only small variations in contrast are introduced by cell structure. Variations as great or greater than those introduced by the cells may be introduced by inhomogeneities in the material substrate or by particles floating in the medium. Furthermore, the places where such artifacts contribute to image intensity are unpredictable. This limits the value of the phase technique for the purpose of acquiring images and quantifying variations in their intensity.
The weak contrast variations inherent in phase microscope images were enhanced by the method of video-enhanced differential interference contrast microscopy. This technique enabled people to view and interpret dynamic events that happen in real time by using thin preparations of cells themselves or components extracted from cells. By exploiting analog video-based methods of amplifying contrast, the technique made certain macromolecular assemblages apparent, which are otherwise not visible by light microscopy. A limit on visibility of macromolecular assemblages inside the cell is set by the sample thickness. Thick samples give rise to images that are too complex to interpret due to their overlapping planes of imaged structure, which planes are all projected onto the two-dimensional plane of an image. Therefore, the usefulness of analog, video-based methods for amplifying phase shift-induced contrast is limited to specimens thinner than a cell.
To overcome the limitation imposed by projecting many overlapping layers, each of which contains significant structural details, onto a two-dimensional image plane, principles of confocal microscopy were developed. These principles were reduced to practice by Egger and Davidovits at Yale, by Shephard and Wilson in Oxford, and by Brakenhoff and coworkers in Amsterdam (Scanning 10: 128–138). By making optical sections through the whole thickness or a portion of the thickness of a specimen, confocal microscopes contrive to break down the three-dimensional (3D) information contained in a cell into a series of image planes. These planes may be spaced as little as a fraction of a micron apart. When a number of such planes are collected throughout the thickness of a cell, this enables the viewer to see the relationship between structures for which information may be found in two or more separate planes.
The relationship between structures in separate confocal planes is visually inspected by rotating the planes together as an object in a high-speed digital computer. Alternatively, a person can flip through the planes, which are thereby represented as a moving picture or “fly through” of the image. Nevertheless, one who wishes to automate the analysis of cell structures must apply algorithms that work on the 3D content of the image dataset. Except for those few examples where software has been developed to address specific problems, the availability of such algorithms is very restricted. Practitioners of the art in this field have not developed or refined algorithms for specification and relation of the content of multiple planes in such datasets to each other. To overcome this limitation, such practitioners would need to automate the recognition of different cell components in the datasets. They would need to understand many of the principles regulating the relationship among cell components and how such relationships are changed. As the state of the art does not include such an understanding, progress in this type of problem requires further research.
Some researchers have used a microscope-based scanner, controlled by a motor driven stage, to provide a digitized signal from a sensor. Here, the digitized signals are processed in order to automatically recognize cells while the sample is being scanned. The type of features recognized include such physical characteristics as light scatter, refractive index, and optical density and dimensions such as cell diameter and width. Mathematical transformations of the signal such as Fast Fourier transform, convolution, correlation, etc. can also be performed (U.S. Pat. No. 4,700,298, “Dynamic Microscope Image Processing Scanner”). Although images may be collected using a microscope-based scanner, the majority of reports issued in recent years have utilized other means of image acquisition. One group of investigators employed a television camera to acquire the images (U.S. Pat. No. 4,453,266, “Method and Apparatus for Measuring Mean Cell Volume of Red Blood Cells”), whereas a commercial image analysis system was used in another (U.S. Pat. No. 6,025,128, “Prediction of Prostate Cancer Progression by Analysis of Selected Predictive Parameters”).
While the numerous advances in imaging referred to above, have been reduced to practice, they still fall short of being able to represent the complex matter composing a cell in a form that is useful for assaying the shape of cultured cells. The arrangement of any single ingredient of a cell may now be presented in the full 3D coordinate space representing the cell in a high-speed digital computer, but there are hundreds or thousands such molecules mediating the shape of a cell, and so this increases the complexity of the analysis rather than reducing it. To develop methods that are amenable to interpretation and to make up an assay for studying shape properties relevant to problems of biomedical significance, practitioners of the art have ignored advances such as video-enhanced contrast and confocal microscopy. They have instead used a two-dimensional projection of the 3D structure of a cell in the light microscope. One group of investigators developed a system for classifying cells, using histological preparations of cells. A patient's cancerous tissue was used to classify cells, where the objective was to improve prognosis and predict whether the patient's disease would progress to malignant disease. Chromatin texture data were extracted from images of cell nuclei in tissue stained with Feulgen: stain or with hematoxylin and eosin (U.S. Pat. No. 6,025,128). Shape factors were used to describe the shape of cancer cell nuclei. The inventors also claimed the use of training against samples from patients who were known progressors and non-progressors. The results were used as a basis of predicting prostate cancer progression in unknown samples from patients. Training was implemented by a neural network.
The cells in tissue samples are crowded together and cannot be readily distinguished from one another at the boundaries; thus, in the prostate example, variables for shape and entropy are calculated based solely on the shape of the cell nucleus. Since cells in cytological samples and cell cultures are sometimes found entirely separate on a flat substratum, each can be seen in its entirety. If such a sample is viewed with adequate contrast and resolution, some cell shape features and details at the cell edge can be resolved. Three shape features were disclosed by Bacus (U.S. Pat. No. 4,199,748, “Automated Method and Apparatus for Classification of Cells with Application to the Diagnosis of Anemia”). Here, a shape circularity factor was calculated by comparing the square of the number of pixels on the perimeter to the area, measured in the number of pixels enclosed by the boundary. Two additional shape factors, namely the number of “spicules” on the boundary and the comparison of orthogonal boundary chain code orientations, were also computed. Similar shape variables were disclosed in other patents, but these inventors disclosed only a small number of such variables.
Measurements based on the absorbance or optical density of a cell following staining, as disclosed in the above-claimed inventions, had the major drawback that the colors and intensity conferred by a stain could vary due to the irreproducibility in reagents. For example, compounds used to make up different lots of the stain can vary, as can the nature of trace elements and compounds in the water used to dilute the reagents and rinse the specimens. An additional drawback of such methods is that they project the 3D image of a cell into a two-dimensional plane, which means that it is impossible to tell whether any given portion of a cell is thicker than usual or molecules are merely more dense in that portion of the cytoplasm. Whereas it is apparent from the prior art that the shapes of cells or portions of cells can be detected and quantified, the prior work did not develop an analysis of cell shape in detail or at high resolution. To overcome this drawback, while still dealing with the difficulty that few algorithms exist which can relate cell components to one another, the inventor developed a method for analyzing shape features in more detail than previously done by other workers.
Methods for Shape Analysis: To improve contrast and resolution in cultured cell specimens, investigators have used anodic oxide interferometers. These substrates consist of a glass slide coated with a metal which is then anodized so as to form an oxide film insulating the metal. One can view selective interference on such anodic oxide interferometers by using reflected light on a light microscope equipped with appropriate optics. A multi-colored image of the cell is established, owing to the high refractive index of the oxide introducing destructive interference in certain wavelengths. In white light, which contains waves of different lengths, a long wave may travel through an oxide layer which is on the order of a fraction of the wavelength in thickness. Upon reflection off the metal, this wave combines with incoming waves to introduce partial destructive interference. A shorter wave, however, whose path through the thickness of oxide layer corresponds to the distance of one wavelength, shows a completely destructive interference effect. Thus, when a dielectric layer on the size order of a cell is adsorbed to the interferometer, the pattern of interference shows attenuation of the light in some wavelengths and cancellation of the light in others (72). Because the interference pattern is changed by even a thin layer of material, the margins of the cell can be visualized in high contrast. Moreover, the thickness of the cell causes a repetition of interference orders and provides information about the shape of the cells in the third dimension, namely height (73).
The interference method enhanced resolution to such an extent that the investigators could measure the values of numerous shape variables, calculate equations for several model figures, and render 35 of the shape variables dimensionless (74). Said variables were descriptive of the following mathematical values:    1. OCNT=Number of interference contours    2. SHPF=Perimeter squared/area of the contour    3. PTOM=Perimeter/2×major axis    4. AXRT=Length of major axis of ellipse/length of minor axis    5. ARAT=Area of ellipse of concentration normalized to area of the contour    6. AFRN=Area of contour/area of lowermost interference contour    7. DCNT=Distance between highest point and centroid of ellipse of concentration    8. ANGL=Angle formed between major axis and a line joining the DCNT points    9. CURV=(Perimeter/number of points)×summed curvature values    10. CSQD=(Number of points)−1×(summed curvature values)2    11. NONC=Number of negative curvature regions normalized to perimeter length    12. FRNC=Length of perimeter in negative curvature normalized to perimeter length    13. LNNC=Mean length of negative curvature normalized to length of major axis    14. SDNC=Standard deviation of LNNC values    15. BMPS=Number of minor projections on perimeter normalized to perimeter length    16. MEDN=Mean length of projection medians normalized to length of major axis    17. SDMD=Standard deviation of MEDN    18. ALTI=Mean altitude of projections normalized to length of major axis    19. SDAL=Standard deviation of ALTI values    20. WDTH=Mean width of projections at base normalized to length of major axis    21. SDWD=Standard deviation of WDTH values    22. MDAL=Ratio of median length/altitude of projections    23. CENT=Mean distance from centroid to perimeter normalized to length of major axis    24. SDCD=Standard deviation of CENT    25. FOCI=Mean distance from foci to perimeter normalized to length of major axis    26. SDFD=Standard deviation of FOCI    27. FINE=Area of contour included in ellipse normalized to area of the contour    28. MAXP=Area of polygon formed from local maxima normalized to area of the contour    29. MINP=Area of polygon formed from local minima normalized to area of the contour    30. ASHR=Area of minimum convex envelope normalized to area of the contour    31. PSHR=Perimeter of minimum convex envelope normalized to perimeter length    32. CAVS=Number of major concavities in perimeter normalized to perimeter length    33. ACAV=Mean area of the concavities normalized to area of minimum convex figure    34. CVSD=Standard deviation of ACAV    35. LCAV=Area of largest concavity normalized to area of minimum convex figure
To test the robustness of the information generated by these methods, the investigators computed values for the shape variables of cells from different cell lines, based on the interference images described above, and used them to form natural groupings. For three epithelial lines studied, there was a striking correspondence between these groups and the actual cell lines of origin (74). Clonally derived populations originating from the same cell line showed a great deal of overlap, whereas cell lines obtained by distinct techniques, albeit from the same tissues, showed far less overlap. When cells were classified by means of principal components, the results appeared superior to those obtained by other methods, such as hierarchical clustering. In principal components classification, only 6% of cells from one lineage overlapped with a those from a cell line having a completely distinct origin (72). Finally, the investigators employed the values of shape variables to create a database representing typical samples from normal and oncogenically transformed rodent cells (7). The concept of training against known samples, disclosed by others as indicated above (U.S. Pat. No. 6,025,128) was also applied by the current investigators to predict the characteristics of an unknown sample of cells. To classify the unknowns, the investigators merely needed to compute the values of variables for shape features and compare them to corresponding values for the known samples that were stored in the database (9).
One of the drawbacks of the classification method was that, although the shape variables were precise in the mathematical sense, they had little intuitive relationship to the shape features of a cell on a one-to-one basis. This meant that the investigators might recognize a sample of cells as having the overall signature of a transformed or a nontransformed population, but not be able to distinguish among their morphological features those ones that conferred the signature values. Because some of the mechanisms of transformation involve GTPases which regulate formation of actin-based features such as filpopodia, microspikes, ruffles, and stress fibers, a method of relating said morphological features to the overall phenotype was of particular interest. When values of shape variables were combined through a method called latent factor extraction, new variables were created which incorporate weighted representations of the values of the original variables. In applying it to the training sets, i.e., known transformed and nontransformed cells, the investigators showed that much of the information content of the database was retained (3). Cells sampled from populations could still be solved, based on the values of selected latent factors, by presenting them as unknowns in relation to cells from the database (75). Thus, while the advantages of attaching quantitative values to the cells were retained, this method had the additional advantage that many or most of the latent factors used in the classification could also be interpreted in terms of qualitative shape changes, which were recognizable by the investigators on an intuitive basis.
Automation: A partial automation of the above process was achieved, in order to get rapid throughput in processing a contour or multiple contours from cells. The work that fed into automation was based on a principle of spatial mapping. Although this technique is familiar to electron microscopists, it is less commonly applied in other scientific fields. With the recent advances in methods for recognizing and extracting regions of interest from digital images, however, it became feasible to extract patterns from the electron microscope image and overlay the original image with these representations from the data. This method, called data mining, was applied to make maps of the 30-nm fiber of chromatin and thus to solve the packing order of the next higher level structures (76, 77, 78). Similar software and techniques are used to recover the boundaries from a contour or multiple contours from the image of a cell, however.
In the chromatin example, images were stored and accessed in Sun workstations, and image processing was performed with a commercial software package (Inovision Corp., Durham, N.C.) running under a SunView interface. Standard image processing routines were used. The edge details in the image were enhanced by making the first derivative of intensity in two directions. This method of processing is called “Roberts cross” in the terminology of the field and results in an image that can be thresholded. Thresholding is an operation in which minimum and maximum gray levels are established and pixel intensities falling outside these limits are set to zero. Intensity values within the threshold range are retained.
In the prior art, a series of images was made at different threshold values and these were subjected to the following data mining procedures. First, the investigators extracted patterns by employing a segmentation operation. The pattern specification in the chromatin problem was to report a pixel if it was found adjacent to any other filled pixel in the row being processed, unless the stream of coordinates reported was found to be too short or too long to represent a 30-nm structure. The program executing the algorithm (“tracit.p”) reported coordinates of continuously filled structures in the image, that were on the size order of the structures we wished to segment (76, 77, 78).
A second program was designed to recognize and identify any area that might be a closed circular or elliptical figure. Working on coordinate streams output by the first program “tracit.p”, the second program found the boundary on a stream of contiguous filled pixels. An algorithm executed in this program enabled a gap between unfilled pixels to be “filled in”, if it were the size of two pixels or fewer. If, even after this gap-filling routine, the figure did not form a closed contour, it went unreported. The output files of vectors incorporated values of the perimeter, area, and the equation for the ellipse of concentration of the closed figure. When figures were extracted from the thresholded series, their mode areas were found to be in the interval 676–784 nm2. Since this compared favorably to the hypothetical 707 nm2 area of a 30-nm chromatin fiber in cross-section, the regions of interest were thought to consist largely of the outlines of 30-nm fibers (76). Dimensioned data on the figures from all of the thresholded images were accumulated into a file, which was processed for statistical analysis in a package of commercial software (78).
The above method has been used analytically but never to assay for cell characteristics, until the present invention.
Substituting the interference principle for acquiring the image avoids the major drawback of the acquisition of a cell image via absorbance or optical density principles, which is that colors and intensity conferred by the stain can vary due to the irreproducibility in reagents such as the compounds used to make up different lots of the stain. Moreover, these methods are directed at the analysis of cell nuclei. These methods present a simpler subject for analysis, thereby overcoming some of the difficulties of developing algorithms for relating the different cell components, when few of the principles regulating the relationship among such components are known. The drawback of such methods is that they mainly analyze a representation which is only one portion of a cell. To overcome this limitation, while still getting around the difficulty in developing algorithms for relating different cell components, the understanding of whose relationship is incomplete, it is essential to reduce the complexity of the 3D image dataset. To this end, techniques for detecting and measuring the shape of contours in a cell have been developed.
Thus, substituting a wide range of shape variables for a narrower range of variables that have been extracted from the stained in situ tissue or cytological sample confers a greater power on the method of the present invention of distinguishing transformed and nontransformed cells.