The present invention relates to DNA array image analysis, and, in particular, to automatically segmenting DNA array images into individual DNA spot images for quantification.
Cellular behavior is primarily dictated by the selective expression of a subset of genes. Normal growth and differentiation depends on the appropriate genes being expressed in a desired context. Various disease states alter the normal expression of genes as compared to normal tissue. For example, malignant transformation of cancer tissues involves or induces altered gene expression. Through signal transduction cascades and transcriptional networks, alterations of one gene can impact a large number of genes and result in global effects on cell behavior. Regulation of translation and post-transcriptional modification play significant roles, but, invariably, signal transduction pathways lead to the nucleus and changes in gene transcription.
Therefore, there has been enormous interest in the development of techniques that allow the analysis of differential gene expression between different tissues or cell lines. One such technique includes use of ordered micro-arrays that allow two color fluorescence detection of hybridization signals. Individual DNA targets are arrayed on a small glass surface and hybridized with fluorescently labeled heterogeneous DNA probes derived from cDNA. The amount of fluorescence at each DNA spot correlates with the abundance of that DNA fragment in the probe mixture.
Using micro-arrays, gene expression levels can be quantitated at up to thousands of genes simultaneously. As hundreds of the same array can be printed, numerous tissues can be easily analyzed for relative expression levels. As such, the technique provides a powerful new tool for analyzing differential gene expression in numerous biologic problems. In addition to the determination of gene expression differences between tissues, genomic micro-arrays are useful for genomic mapping, genomic ploidy measurements and as hybridization targets for genomic mismatch scanning. Such techniques require rapid quantitative analysis of fluorescent hybridization for hundreds to tens of thousands of DNA spots. As such, there is a severe bottleneck in gene expression data collection due to inadequate methods for processing of individual DNA spot images for determining the quantitative fluorescent hybridization levels.
Some existing methods include manual processing of DNA spot images using a generic image processing tool, such as NIH image. Using such a tool a user visually locates each DNA spot image in a micro-array image, and moves a display pointer to each spot image, and manually defines a small area around the spot image. The image processing tool then reports image intensity values within the small area. The user then manually records the intensity values and continues this process for other visually located DNA spot images in the micro-array image.
However, such manual methods are impractical for micro-arrays with more than a handful of spot images. Further such methods are tedious and repetitive, requiring considerable time and effort. For example, with a micro-array image having about 600 DNA image spots, such manual methods can take about 8 hours of work, and resulting in quantification of only a limited number of image spots which visually seem to have a xe2x80x9cgoodxe2x80x9d expression level. As the micro-array density increases and becomes more complex, use of such methods becomes even more prohibitive. For example, current micro-array sizes range from several hundred to 1,200 genes, arrayed in a 1.8xc3x971.8 cm area. As tip fabrication has improved, arrays with greater than 50,000 genes are viable. Such methods are also prone to various errors, including errors in manually recording the intensity values. Further such methods provide inconsistent quantification of intensity values, both for different spot images measured by a single individual, and for multiple individuals making measurements from the same micro-array image.
To alleviate the shortcomings of manual methods, some existing methods automate the process of locating DNA spot images from micro-array images and quantifying corresponding expression values. Such methods utilize a computer to manually position a cell grid on an area of the micro-array image containing an array of DNA spot images. The grid can be resized and individual columns and rows of the grid can be manually adjusted to better fit the arrayed pattern of DNA spot images. The grid position is then used by the computer to quantify the expression values using the intensity levels at each cell in the grid. However, such methods are inflexible since the grid placement requires extensive user interaction to fine-tune the grid. Further, the grid used in such methods is either completely fixed in shape, or has limited global flexibility (e.g., resizing and rotating the entire grid).
Such limitations cause a major handicap in most DNA array image analysis applications since DNA spots are never perfectly formed in a regular grid pattern in a micro-array such as shown in FIG. 1. Although a robot used in spotting DNA fragments on a glass surface has positional accuracy to within +/xe2x88x925xcexcm, larger variations in the precise spacing of the arrayed DNA spots occur due to surface interactions of the solution with the silanized surface and tip variations. Moreover, printing tips are difficult to fabricate and many do not work uniformly. Therefore, as shown in FIG. 2, not only are DNA spots occasionally placed out of the regular grid pattern, but they also vary in size. It is therefore rare to have a fixed grid that can match exactly the pattern in the micro-array. Though in existing methods the grid can be Manually resized, rotated, and a column or a row of the grid can be moved, the individual grid cells cannot be manipulated. Therefore, such methods are impractical for most DNA array image analysis applications, and specially for high density micro-arrays
Further, DNA spot image signals derived from the micro-arrays are susceptible to surface noise and laser reflection, due to surface dust. And, nonspecific DNA binding to the silanized surface occurs in a non-uniform pattern creating a varying background of fluorescence over the surface. Existing methods are unable to cope with irregular micro-array pattern, search for DNA image spots, and accurately quantify specific signals while accounting for the local background.
Other existing methods do not use a grid at all but apply a xe2x80x9cspotxe2x80x9d filter to detect locations in the micro-array image which xe2x80x9clook-likexe2x80x9d DNA spot images. However, using such methods it is difficult to define what a spot should look like. Furthermore, extensive noise and variations in the spot shape, due to the processing and scanning mechanisms, significantly reduce the signal to noise ratio (SNR) of the spot images. Thus, the detection scheme misses many real spots and processes many false patches in the image as real DNA spot images.
Another disadvantage of existing systems is their inability to display micro-array image pixel intensities, corresponding to gene expression values in related DNA spots for example, in an intuitive manner. As such, the user cannot easily determine gene properties in such DNA spots.
There is, therefore, a need for a DNA array image analysis method for automatically segmenting DNA array images into individual DNA spot images for quantification. There is also a need for such method to process irregular micro-array patterns, search for DNA image spots, and accurately quantify, and intuitively display, specific signals while accounting for the local background.
The present invention satisfies these needs. In one embodiment, the present invention provides a method for segmentation of a frame of image information including a plurality of spaced DNA spot images corresponding to a plurality of DNA spots, the image information including image intensity level and intra frame position information corresponding to said DNA spots. The method of the present invention comprises the steps of: (a) transferring the frame of image information into a memory device; (b) selecting a set of image information within said frame including a selected set of the DNA spot images; (c) generating a grid in said memory device, the grid including a plurality of spaced grid points corresponding to said selected DNA spot images, each grid point including position information indicating the position of the grid point within said frame; and (d) modifying a current position of at least one grid point corresponding to a spot image to shift the grid point toward the corresponding spot image. Step (d) can be repeated for said grid point and for all the grid points of the grid.
The step of modifying said current position includes: (i) selecting a first bounding area in the frame around the current position of the grid point; (ii) generating a first position update including position information for updating a current position of said grid point to a first new position within the first bounding area, the location of said first new position relative to said current position being a function of intensity level of at least a portion of the image within the first bounding area; (iii) generating a second position update including position information for updating said current position to a second new position in the frame, said second new position being in a geometric arrangement with the position of one or more grid points around said grid point; and (iv) updating said current position with the position information of the first and the second position updates, thereby shifting said grid point toward the corresponding spot image. The DNA spot images can be in a substantially two dimensional array arrangement, and generating the grid can include generating a two dimensional array of grid points spaced according to a predetermined criteria.
The method can further include the step of segmenting the selected set of image information by selecting at least one image segment defining a segment area around a grid point and including a spot image with minimum distance from said grid point, said segment area being a function of the spacing between said grid point and one or more neighboring grid points. The selected set of image information can further be segmented into a plurality of image segments corresponding to the plurality of grid points in the grid, each image segment defining a segment area around a corresponding grid point and including a corresponding spot image with minimum distance from said grid point, said segment area being a function of the spacing between said grid point and one or more neighboring grid points, wherein each spot image is contained in a corresponding image segment.
The method of the present invention can further include quantifying at least a portion of image information in said image segment to obtain image characteristic values for said image segment. The image characteristic values can include DNA information for a DNA spot corresponding to the DNA spot image in said image segment, said DNA information including gene expression values.
In another aspect, the present invention provides a method of displaying image information corresponding to a plurality of DNA spot images of at least one DNA spot, the image information including image characteristic values including background and signal intensity levels. In one embodiment, the display method includes the steps of: (a) for each DNA spot image: (1) extracting said background and signal intensity levels from the image characteristic values for the spot image, and (2) determining difference values between the background intensity levels and signal intensity levels; and (b) for each DNA spot: (1) relating the corresponding difference values to a range of graphic values, (2) selecting a graphic value for each difference value, and (3) displaying the selected graphic values. The steps of relating and selecting can include associating each difference value to a segment of a pie chart having multiple segments, and the step of displaying the selected graphic values can include displaying said segments as a pie chart. The area of each segment of each pie chart can be a function of the magnitude of the associated difference value.
In another aspect, the present invention provides a software system for configuring a computer system comprising a processor, and a memory device, to perform the steps of the methods of the present invention described above. The present invention also provides a computer system including means for performing the steps of the method of the present invention.
As such, the present invention provides a method, software system and computer system for automatically deforming a grid to locate individual DNA spot images and to quantify the spot images for measuring the local signal and background intensity values for the spot images, and to display such values. The method and software system of the present invention automate data quantification and extraction in DNA array image analysis applications.