Pharmaceutical, biotechnology, or genomics companies use polynucleotide arrays (such as DNA or RNA arrays), for example, as diagnostic or screening tools. Such arrays or microarrays include regions (sometimes referenced as spots or features) of usually different sequence polynucleotides arranged in a predetermined configuration on a substrate such as a microchip. The arrays, when exposed to a sample, will exhibit a binding pattern. This binding pattern can be observed, for example, by labeling all polynucleotide targets (for example, DNA) in the sample with a suitable label (such as a fluorescent compound), and accurately observing the fluorescent signal on the array. Assuming that the different sequence polynucleotides were correctly deposited in accordance with the predetermined configuration, then the observed binding pattern will be indicative of the presence and/or concentration of one or more polynucleotide components of the sample.
Biopolymer arrays can be fabricated using either in situ synthesis methods or deposition of the previously obtained biopolymers. The in situ synthesis methods include those described in U.S. Pat. No. 5,449,754 for synthesizing peptide arrays, as well as WO 98/41531 and the references cited therein for synthesizing polynucleotides (specifically, DNA). The deposition methods basically involve depositing biopolymers at predetermined locations on a substrate which are suitably activated such that the biopolymers can link thereto. Biopolymers of different sequence may be deposited at different regions of the substrate to yield the completed array. Washing or other additional steps may also be used. Procedures known in the art for deposition of polynucleotides, particularly DNA such as whole oligomers or cDNA, are described, for example, in U.S. Pat. No. 5,807,522 (touching drop dispensers to a substrate), and in PCT publications WO 95/25116 and WO 98/41531, and elsewhere (use of an ink jet type head to fire drops onto the substrate).
A scanner is then used to read the fluorescence of these resultant surface bound molecules under illumination with suitable (most often laser) light. The scanner acts like a large field fluorescence microscope in which the fluorescent pattern caused by binding of labeled molecules is scanned on the chip. In particular, a laser induced fluorescence scanner provides for analyzing large numbers of different target molecules of interest, e.g., genes/mutations/alleles, in a biological sample.
The scanning equipment typically used for the evaluation of microarrays includes a scanning fluorometer. A number of different types of such devices are commercially available from different sources, such as Axon Instruments in Union City, Calif. and Perkin Elmer of Wellesly, Mass. Analysis of the data, (i.e., collection, reconstruction of image, comparison and interpretation of data) is performed with associated computer systems and commercially available software, such as GenePix by Axon Instruments, QuantArray by Perkin Elmer or Feature Extraction by Agilent of Palo Alto, Calif.
In such scanning devices, a laser light source generates a—most often collimated—beam. The collimated beam sequentially illuminates small surface regions of known location on an array substrate. The resulting fluorescence signals from the surface regions are collected either confocally (employing the same lens used to focus the laser light onto the array) and/or off-axis (using a separate lens positioned to one side of the lens used to focus the laser onto the array). The collected signals are then transmitted through appropriate spectral filters to an optical detector. A recording device, such as a computer memory, records the detected signals and builds up a raster scan file of intensities as a function of position, or time as it relates to the position. Such intensities, as a function of position, are typically referred to in the art as “pixels” or “pixel values.” Collectively, the pixels make up a microarray scan image having a multiplicity of feature cells, wherein each feature cell is comprised of a group of pixels.
In array fabrication, the quantities of DNA available for the array are usually very small and expensive. Sample quantities available for testing are usually also very small and it is therefore desirable to simultaneously test the same sample against a large number of different probes on an array. These conditions require use of arrays with large numbers of very small, closely spaced spots.
The use of microarray technologies to conduct experiments that measure thousands of genes and proteins simultaneously and under different conditions are becoming the norm in both academia and pharmaceutical/biotech companies. Microarray technology is leading to greater feature density as well as to extremely high-resolution scanning. In their largest capacities, such as in a full human genome catalog array, there may be as many as three or four 25,000 to 50,000-feature cells. This results in increasingly large amounts of both image and feature analysis data which can be problematic for several reasons. First and foremost, many, if not most, features on a typical catalog array may be inconsequential to the experiment being conducted. Secondly, the high density and large number of features on an array make it difficult or impossible to do effective visual feature-to-feature comparisons. Additionally, the more features and the greater complexity of an array, the more difficult it is to create a logical layout of probes that is meaningful to the experimenter. This is particularly problematic for catalog arrays, where the nature and purpose of the experiment may be unknown at the time of the array design.
While advancements in bioinformatics have been made which help scientists to extract, build and verify interpretations and hypotheses about microarray data, all of the features of the array must first be extracted before the results can be streamlined. There is still a need to further streamline the data in order to minimize the volume of data to only that which is particularly relevant to the hypothesis or experiment at hand and to facilitate the visualization of high-density arrays. The present invention seeks to provide such streamlining of data prior to extraction of the features from the array in order to reduce to the volume of data being produced as well as to facilitate visualization and inspection of high-density arrays.