Pharmaceutical, biotechnology, or genomics companies use polynucleotide arrays (such as DNA or RNA arrays), for example, as diagnostic or screening tools. Such arrays or microarrays include designed, localized regions (sometimes referenced as spots or features) each of a specific sequence of polynucleotides arranged in a predetermined configuration on a substrate such as a microchip. The arrays, when exposed to a sample, will exhibit a binding pattern. This binding pattern can be observed, for example, by labeling all polynucleotide targets (for example, DNA) in the sample with a suitable label (such as a fluorescent compound, radioisotope, molecular diode, or other know label), and accurately measuring all such labeled signals expressed on the array. Assuming that the different sequence polynucleotides were correctly deposited in accordance with the predetermined configuration, then the observed binding pattern will be indicative of the presence and/or concentration of one or more polynucleotide components of the sample. Such arrays geometrically (i.e., spatially) separate different gene expressions.
Biopolymer arrays can be fabricated using either in situ synthesis methods or deposition of the previously obtained biopolymers. “In situ” synthesis requires writing each component of the sequence at each probe location until the complete sequences are achieved according to a set of commands/instructions (scripts) that specify the desired sequences. I situ synthesis may be carried out by a number of different processes, including, but not limited to, phosphoamidite processes or photolithographic methods, for example. The deposition methods basically involve depositing biopolymers at predetermined locations on a substrate which are suitably activated such that the biopolymers can link thereto. Biopolymers of different sequence may be deposited at different regions of the substrate to yield the completed array. Washing or other additional steps may also be used. Procedures known in the art for deposition or writing of polynucleotides, particularly DNA such as whole oligomers or cDNA, include touching drop dispensers to a substrate or use of an ink jet type head to fire drops onto the substrate.
Each deposition or in situ layer, performed by any of the techniques, is deposited to within a designated localized area, e.g., the feature zone or area of which is predetermined and generally having a polygonal shape (rectangular, square, hexagonal, octagonal or the like) of predetermined dimensions designed to closely pack the probe features on the array to maximize the number of gene probes that can be efficiently included on the array, and still effectively read from the array.
Labeled biological sample(s) (i.e., “target”) are then prepared, labeled and hybridized to the probes on the array, although other method of detection without labels have previously been described and may be alternatively processed.
Typically, radioactivity or some form of electromagnetic energy is used to measure responses at each probe. For example, a scanner may be used to read the fluorescence of these resultant surface bound molecules under illumination with suitable (most often laser) light. The scanner acts like a large field fluorescence microscope in which the fluorescent pattern caused by binding of labeled molecules is scanned on the chip. In particular, a laser induced fluorescence scanner provides for analyzing large numbers of different target molecules of interest, e.g., genes/mutations/alleles, in a biological sample.
The scanning equipment typically used for the evaluation of microarrays includes a scanning fluorometer. A number of different types of such devices are commercially available from different sources, such as Axon Instruments in Union City, Calif.; Perkin Elmer of Wellesly, Mass.; and Agilent Technologies, Inc. of Palo Alto, Calif. Analysis of the data, (i.e., collection, reconstruction of image, comparison and interpretation of data) is performed with associated computer systems and commercially available software, such as GenePix by Axon Instruments, QuantArray by Perkin Elmer, Feature Extraction by Agilent of Palo Alto, Calif., or Affy Scanner, available from Affymetrix, Santa Clara, Calif.
In such scanning devices, an array, or portion thereof, is simultaneously scanned and imaged, such as with the use of a CCD sensor, for example and electronically read to interpret signal intensities of the scan. Such intensities, as a function of position, are typically referred to in the art as “pixels” or “pixel values.” Collectively, the pixels make up a microarray scan image having a multiplicity of feature cells, wherein each probe feature cell is comprised of a group of pixels. Commonly used feature sizes include features which are each made up various resolution, e.g., 100 pixels (10×10 pixel spot size) or features each made up of 400 pixels (20×20 pixel spot size), for example, although such sizes may vary and are predetermined before manufacture of the array. Each pixel over a probe location contains the signals from many millions of sequences at the probe at least. Some of the sequences are distorted from their scripted design by noise factors. Some sequences are attached to labeled sequences from the target that are particularly noisy. However, there is generally a subpopulation of probe sequences that produce superior signal strength and low noise. Different pixels capture more or less of this subpopulation of high quality signals. The present invention directly and efficiently identifies the set of pixels that best capture the high-quality subpopulation for each probe/feature on a microarray.
On two color (two channel) systems, direct comparisons are optimal between two different biological samples, wherein one sample is encoded with a green fluorescing dye and the other is encoded with a red fluorescing dye, for example. The differential gene expression between the two samples is then given by the color at each probe because the color is determined by how much red fluorescence and green fluorescence is present at each probe. With a one color, or single channel system, absolute signals or intensities are measured. With a single channel system, one biological sample may be measured on a microarray, and a second biological sample can be measured on a second microarray. The readings are then compared to determine ratios between the results of the two arrays.
The scanner output may be represented as an image file of ordered sequential signals (such as a TIFF file, for example). Image processing is then performed to organize signal patterns and quantitate the value at each feature (localized probe or “spot”), or to evaluate the values of red and green at each feature for a two channel system. Once the features values are determined, ratios can be calculated.
In array fabrication, the quantities of biochemicals or DNA available for the array are usually very small and expensive. Sample quantities available for testing are usually also very small and it is therefore desirable to simultaneously test the same sample against a large number of different probes on an array. These conditions require use of arrays with large numbers of very small, closely spaced spots.
The use of microarray technologies to conduct experiments that measure thousands of genes and proteins simultaneously and under different conditions are becoming the norm in both academia and pharmaceutical/biotech companies. Microarray technology is leading to greater feature density as well as to extremely high-resolution scanning. In their largest capacities, such as in a full human genome catalog array, there may be as many as three or four 25,000 to 100,000-feature cells. This results in increasingly large amounts of both image and feature analysis data which can be problematic for several reasons. First the higher the density of features on an array, the increasingly more difficult it becomes to accurately extract these features. Higher accuracy and precision of the scanning apparatus becomes necessary. Even more importantly, higher accuracy and precision of the manufacturing techniques, preparation techniques, and associated apparatus are required, so that at the user end, the user can located the information to be read and distinguish it from noise.
Currently, arrays from different sources and/or manufacturers vary greatly in quality. Variations in both signal and optical properties of probes on an array occur due to poor stability/quality or errors in the application of the features to the chip. Ideally, when the features are dots or spots, each should be well-formed (e.g., a substantially perfect circle) and uniformly spaced. As hybridized, typically a rim is formed around a slightly indentured center producing a halo effect. With the wide variation of manufacturers now available, however, the feature images are not always so homogeneous. For example, fluorescent “doughnuts” (i.e., a dot only filled circumferentially along the perimeter, with at least a partial blank or hole, or even a spike in the center) may be formed in some instances, rather than a fully filled circle with only slight indenture. Other partially formed or mis-formed features or manufacturing may also occur, such as crescent-shaped features; “measled” spot images, irregular boundaries (perimeters) of the features; misaligned rows or columns of features; misalignment between consecutive features, along a row and/or a column; variations in the size or circumferences of the dots; and others.
Most quantitation methods are based on the intended design of the feature spatial pattern for an array as printed or written. That is, most quantitation methods “look for” the configuration of the signal spot (i.e., feature) as it was intended to look by its predetermined geometry and dimensions. This technique is often referred to in the art as using a “cookie cutter” to outline the feature with a template or “cookie cutter” of the predetermined shape and size, which is positioned within each area of the microarray that is laid out to have a feature deposited or written thereon, in a location where it is determined that the best defined feature is represented. For example, when an array is divided into squares of predetermined equal geometries within each of which it is intended to deposit or write a circular spot or feature (with each feature intended to have the same diameter and be clearly geometrically separated from all adjacent features), the cookie cutter is used to define a circle that captures adjacent pixels having a predetermined minimum radius, within each square area of pixels, that outputs the highest quality ensemble signal. External pixels (outliers) are removed from consideration. This technique becomes problematic with occurrences of malformed features, examples of which were mentioned above.
When malformed features are present (which are quite common occurrences) the cookie cutter methods may not find a generally uniform signal that is shaped in the predetermined shape of the feature, and may have difficulty determining where to locate the best placement for the location of the feature. Further, even if the cookie cutter is successful in determining where to locate each feature, such locations tend to be more non-specific due to the introduction of noise and randomness by the malformed features.
What is needed are better techniques for identifying and using the signals from those sequences on a microarray that are of good quality, that most closely (maybe perfectly) match the sequence that was intended to be deposited. Further, improved techniques for identification and selection of the highest quality signals without the use of localized-based geometric patterns, such as cookie-cutters is needed, to better account for malformed features which may include high quality signals in what would otherwise be considered “outlier pixels” by a cookie cutter method and thus discarded.