Processing techniques for analyzing images are useful in various areas, for example, the identification of objects photographed from a satellite or telescope and the consequent extraction of the relative pixels from the rest of the photographic image. The analysis of images is of crucial importance even in medicine. For example, DNA analysis based on the use of so-called “DNA chips” has been developed and is of increasing importance in the health industry. Indeed, this market is expected to grow to more than $600 million by 2005. A crucial component of DNA chip use is the analysis of the images produced by the chip.
According to one method developed at Stanford University, DNA chips are realized by placing fragments of nucleic acid or “probes” by robotized deposition in a matrix-like arrangement at defined areas on a surface, such as a microscope slide. Probes can also be synthesized in situ, directly on the slide or other surface. The matrix of spots is called an “array” or “microarray,” and can contain hundreds to hundreds of thousands of specific probes for diagnostic, drug discovery, or toxicology uses. In the future, it is expected that even “nanoarrays” will come into general use as the sensitivity of detection increases and as technology is developed to print such fine arrays.
In diagnostic uses, for example, a sample is taken from the blood, urine, saliva or other tissue of the individual. Very often the DNA in the sample is amplified and labeled with one or more fluorescent dyes. If mRNA is to be studied, it is first copied to cDNA, and then amplified and labeled. Changes in the amount or sequence of particular nucleic acids in the sample can be detected on the basis of hybridization to the probes on the DNA chip. This is possible because conditions can be established to allow only perfectly complementary nucleic acids to hybridize to the probes on the chip. When the chip is activated by shining light on it, those probes or “spots” that contain a hybridized labeled sample will fluoresce and can be detected. Thus, hybridization is detected by detecting a fluorescent label at the individual spots of the array.
In one particular application, a reference DNA and a test DNA are both labeled with different dyes and analyzed simultaneously. For example, the reference DNA is labeled with a red dye (CY3) and the test DNA with a green dye (CY5). Thereafter, both samples are applied to a DNA chip and allowed to hybridize with their complementary probes on the DNA chip. This, dual-label analysis can be used in many applications, including the detection of mutations or particular alleles in an individual, or in monitoring the expression of genes in healthy and diseased tissue types. Indeed, multi-label applications can be and are used in DNA chip applications, limited only by the ability to collect and analyze different wavelength signals.
By using a confocal scanner, the DNA chip is thereafter subjected to two (or more) different scans with wavelengths appropriate to the dyes employed. The two images that are obtained are processed by a special computer program capable of analyzing, on the basis of the intensity of fluorescence, whether a labeled nucleic acid is present or not. The luminance (grey-level) of the pixels of the luminous spots in the two images is proportional to the number of dye molecules at the corresponding location of the array. By comparing the red and green images (matching) it is possible to identify the samples that contain sequences complementary to the probe sequences.
It should be noted that any matching analysis is carried out only after the luminance or grey-level of the signal pixels for each luminous spot for both channels (images) has been normalized with respect to the respective luminance (grey level) of the background pixels. Moreover, a further normalization operation is necessary between the images obtained from the two channels (different scanning wavelengths) in view of the fact that the respective mean luminance or grey-level of corresponding spots of the two images changes depending on the dye used.
The processing of the images acquired from an array is complicated by the fact that the data is subject to a number of sources of error. For example, sample nucleic acids maybe differentially amplified, differentially labeled, or hybridize to differing degrees at the particular conditions employed. The array spots themselves may also vary in quality. There may also be errors in data acquisition, for example due to noise. Finally, there may be errors introduced by operator intervention or by imprecision of the instruments used. The fact that the intervention of the operator for analyzing array images is necessary detracts from the reproducibility of the results of the analysis. As a consequence, any matching operation could be inadvertently vitiated ab initio by human error, which may lead to erroneous conclusions.
FIG. 1 shows 48 luminous spots of an image of good quality acquired from an array that has been hybridized to a test DNA labeled with a single dye. It is possible to note some typical characteristics of all array images, indicated on the filtered image of FIG. 2. The luminous spots on the left side of the figure are DNA probes that are relatively neatly rendered in the filtered image. These spots are small, substantially circular, and localized on the darker background. There is also the occasional localized noise (see the two stripes and random small bright pixels) that depend on the fabrication or hybridization process and is generally unforeseeable. Such noise causes variations of the grey level in the darker background areas and within the luminous spots that represent the useful signal.
In general, the analysis of array images contemplates the following steps:                i) array localization, which comprises determining the location and shape of the luminous spots;        ii) spot extraction, which comprises isolating single luminous spots;        iii) intra-spot segmentation, which comprises examining each spot by distinguishing the signal pixels from the background and noise pixels; and        iv) spot quality measurement, which comprises deriving characteristic parameters of the spots and indexes indicative of the quality.        
Array localization is the step that according to present practices requires significant intervention by the operator, who must center each single luminous spot within a respective mesh of a micro-grid. This operation is rather laborious considering that typically the acquired images may contain 10,000 or more spots, distributed on several grids. At the present state of the art, this operation is semi-automatic. The array localization techniques allow to automatically position the grid, but a final trimming by the operator to correct errors of execution of the positioning algorithm is always required. This human intervention may be required for precise tuning, but may also introduce non-negligible human errors and decrease the inter-experiment comparability of the results.
Once the grid is correctly positioned, a binary map that defines the boundaries of the luminous spots on the background is generated. This map is used for isolating the luminous spots that are thereafter examined with a segmentation technique.
The segmentation techniques most widely known for discriminating the signal pixels from the background pixels within a luminous spot are listed herein:                i) Pure Spatial Segmentation;        ii) Pure Intensity-based Segmentation;        iii) Mixed Spatial/Intensity Segmentation;        iv) Mixed Spatial/Statistics Segmentation; and        v) Mixed Spatial, Intensity, Statistics & Morphology Segmentation.        
The “Pure Spatial Segmentation” technique rests on the assumption that all the pixels within a circle (any geometric shape may be used, but for simplicity we refer to a substantially circular spot shape) of a size that is preselected by the operator are signal pixels, while all pixels contained in a neighboring area, of shape and distance from the perimeter of the preselected signal area of which are selected by the operator, are background pixels. In this case, discrimination of the pixels is made only by taking into consideration their location.
The technique of “Pure Intensity-based Segmentation” considers only the pixels of the area containing the spot, and on the basis of the grey level of the internal pixels discriminates signal pixels from background pixels. In this case, discrimination of the pixels is made only by taking into consideration their grey level.
According to the technique of “Mixed Spatial/Intensity Segmentation” the discriminant among signal pixels and background pixels is the luminance, but in two different regions, the circular spot area and the surrounding area. This technique rests on both a spatial and grey level characterization of the pixels.
According to the technique of “Mixed Spatial/Statistics Segmentation” a threshold (level of grey) that discriminates a signal pixel from a background pixel is calculated by statistic methods. The luminance of the pixels within the circular spot area is compared with such a threshold.
The technique of “Mixed Spatial, Intensity, Statistics & Morphology Segmentation” is based on a statistical prior knowledge obtained by a local analysis of the spots, on the luminance distribution and on the morphological characteristics of the spots.
The main characteristics considered as indexes of quality and parameters of comparison among spots are the median luminance values (grey levels) of the signal pixels and of the background pixels, respectively. In general, according to the known methods, eventual morphological characteristics of the spots that may be important in the final phase of validation of the results are not considered.