DNA microarrays are known in which genetic probes are affixed to a substrate at discrete locations for binding with a sample containing labeled genetic material. Terminologies that have been used in the literature to describe this technology include, biochip, DNA chip, DNA microarray and gene array. DNA microarrays are fabricated by high-speed robotics, generally on glass but sometimes on nylon substrates, for which probes with known identity are used to determine complementary binding, thus allowing massively parallel gene expression studies.
In one type of study employing DNA microarrays, the genetic composition of two samples is compared. A first sample, which may be a control sample having a known genetic composition, is labeled with a first detectable label such as red-fluorescent dye, Cy5. A second sample is labeled with a second detectable label such as green-fluorescent dye, Cy3. While Cy3 and Cy5 are exemplified, one of skill in the art is aware that a variety of different detectable labels are commercially available. The two samples are then mixed and applied to the microarray for hybridization with any complementary probes thereon. In the case of samples containing mRNA, for example, the microarray is provided to have a suitable set of cDNA spots for binding. After hybridization, images of the microarray are obtained using a laser scanner at wavelengths of 635 nm (red) and 532 nm (green). In the resulting image, differences between the composition of the two samples is indicated by the respective red and green intensities of the probe locations on the microarray, while the relative abundance of any particular mRNA sequence within the two samples is indicated by the red/green intensity ratio of each spot.
Due to the large number of probes which may be present on the microarray, it is desirable to apply an automated image analysis technique to the determination of red and green intensities present on the array, and the red/green ratio for each spot. Traditional methods of performing this analysis have been accomplished by obtaining a digital image of the chip under fluorescent excitation, and then performing steps of addressing, segmentation, and reduction. In the addressing step, the image areas of the array are located. For example, using the known geometry of the array, the intensity data from portions of the array image corresponding to the probe locations are obtained. Then, for each probe location, the image portion is segmented into background and foreground intensity values by a thresholding function. Finally, in the reduction step, a scalar value of red intensity and green intensity is obtained, from which a value of red/green ratio is calculated.
Early automated analysis methods based on the sequence of steps described above were predicated on the assumption that each excited probe location would be a circular region corresponding the presumed probe location, and that background and foreground intensities would be constant across the array. In practice, however, significant variations in background illumination and spot size and shape occur, which adversely influences the results which can be obtained by methods based on an ideal assumption. Variations occurring in practice include spots of variable size; variable contour, in which the spots have semicircular, toroidal, oval, or other unanticipated shapes; variable background intensities; and spatial artifacts such as smeared or incorrectly segmented probe locations.
Several methods have been developed to overcome analysis difficulties presented by non-uniformities of images of hybridized samples on DNA microarrays. Adaptive shape segmentation techniques are known in which an initial starting point in the image is chosen and then enlarged in the neighborhood of the starting point until a statistical criterion indicative of spot detection has been reached. In a second set of techniques, a histogram of pixel intensity values is produced, and then the respective background and foreground pixels are determined based upon respective percentile ranges of pixels falling within the lower and upper distributions of the histogram. These methods are susceptible to erroneous results from the presence of a relatively few anomalous values within the image of each spot area.
It would be desirable to provide a method of automated DNA microarray analysis which is more accurate and less sensitive to variations of intensity and shape among the probe locations of a digitized fluorescence image of a DNA microarray.