Rapid extraction of data from DNA microarrays can provide researchers with important information regarding biological processes. One type of DNA microarray used to obtain gene expression data is an HDSM. One commercially available microarray is called a GeneChip® manufactured by Affymetrix, Inc. of Santa Clara, Calif.
Technology used to produce HDSM's have now miniaturized the size of the surface area used to hybridize an RNA or DNA sample to DNA probes. For example, one HSDM may employ about 300,000–400,000 (or more) different DNA probe sequences for a single hybridization, all within a 1.28 cm×1.28 cm region (hence, the term “microarray”). Densely packed oligonucleotides of a given probe sequence are localized on the microarray within a region termed a “probe cell”. Thus, typical HDSM's contain about 300,000–400,000 probe cells with homogeneous probe sequences within each probe cell. See Lockhart et al., Expression monitoring by hybridization to high-density oligonucleotide arrays, 14 Nature Biotechnology, pp. 1675–1680 (1996); and Lipshutzet al., High-density synthetic oligonucleotide arrays, 21 Nature Genetics, pp. 20–24 (1999).
In operation, a sample of fluorescent labeled DNA or RNA is hybridized to DNA probes on an HSDM. The hybridization data is extracted by an image system that records the intensity of fluorescence at a discrete number of positions on the HSDM. These positions are laid out in a lattice that can be represented by an array of uniformly sized squares and the corresponding intensities associated with these squares can be used to form an image constructed from pixels. These intensities of fluorescence represent photon counts and are intrinsically non-negative scalars. Typically, these intensities are recorded as a large array of 16 bit unsigned integers and the corresponding image is displayed using grayscale pixels.
An example of an HSDM image at low resolution can be seen in FIG. 1. In FIG. 2, a 100×100 pixel region of the image shown in FIG. 1 is illustrated at a higher resolution. From the image shown at FIG. 2, it can be seen how the probe cells are regularly spaced in a rectangular grid. In the images shown, each probe cell occupies an area that is approximately 8×8 pixels.
The approximate number of pixels in a probe cell will depend on the size of the probe cell on the physical HSDM as well as the resolution at which the HSDM surface was scanned when the hybridization data was extracted. It is not known prior to scanning which area of the physical HSDM surface a given pixel will represent. Allocation of pixel intensities to probe cells (such as via photon counts to probe cells) can be performed using a post-processing operation on the extracted image data. Operatively, an image-processing algorithm is used to estimate the location of each probe cell with respect to the grid of pixels. Using these estimated locations, it is possible to estimate probe cell boundaries and allocate the intensity of individual pixels to probe cells. To accurately or reliably allocate pixel intensities to probe cells, probe cell locations should be substantially accurately estimated. Hence, in order to obtain reliable probe cell data from raw pixel data, accurate estimation of probe cell locations is important.
Unfortunately, the miniaturization of probe cells can complicate probe cell location estimation in the image. To obtain reliable data, the estimate of a probe cell's location is important as it impacts the numerical summary of intensity data for probe cells, which consequently impacts the inference on gene expression.
It is believed that the conventional method used to allocate pixel intensities to probe cells is to obtain a fixed estimate of probe cell locations. Then, for each probe cell, its fixed location is used to select pixels that are deemed to be interior to the probe cell. These interior pixels are allocated to the probe cell and their intensities are summarized.
In the past, to summarize a hybridization, it is believed that the image analysis methods of Affymetrix reports three statistics for each probe cell: (1) the number of pixels belonging to the probe cell; (2) a number describing the probe cell response (the default choice of this number is believed to be the 75th percentile of the probe cell's pixel intensities); and (3) the standard deviation of the probe cell's pixel intensities.
In order to understand the relationship between a pixel intensity and the physical region of the HSDM it represents, recall that there is a distinction between the image of an HSDM and the physical HSDM the image represents. The physical HSDM is segmented. On the HSDM surface, neighboring probe cells do not overlap. However, the image of an HSDM is not segmented. The region of the physical HSDM that a pixel represents may be entirely within a probe cell but may straddle as many as four probe cells. A pixel could also represent a region partly or entirely in the border area surrounding the array of probe cells. Evident in typical HSDM images is the effect of what can be described as a blurring process, each pixel can lose signal to pixels nearby. Intensities of pixels representing regions on or near the perimeter of probe cells can be the most affected by the blurring process and/or the lack of segmentation, in the sense that the signal captured in the intensity of one of these pixels cannot be almost entirely attributed to signal from a single probe cell. Even though the array of probe cells on the physical HSDM might be able be laid out on a near perfect lattice, this lattice may be deformed on the scanned image. As a consequence of the disparities between an HSDM and its image, any model or algorithm that does recognize the distinction between an HSDM and its image, may inaccurately attribute pixel intensities to probe cells without recognizing these phenomena and/or the extent to which they can distort the resulting hybridization summaries.
In view of the above, there remains a need for improved image processing methods that can estimate or identify the probe cell locations on DNA or RNA microarrays.