Biology and medicine have entered the age of the genomics. The completion, and imminent completion, of genomes of many important organisms (e.g., humans, mice, C. elegans, Arabidopsis, various crop plants, and other animals) promises to usher in advances in basic biological research and medical technology. In order to realize the enormous potential benefits to science offered by the raw data collected from genome sequencing efforts, researchers must now turn to conducting studies in functional genomics and proteomics.
The recently completed human genome project, for example, has identified the sequence of the roughly 3 billion base pairs that spell out the human genome in a four letter alphabet (i.e., 3 gigabytes of paired A:T and G:C information). The practical application of this sequence information include more detailed studies to determine where, when, and how particular genes are expressed in an organism, the sequence and functions of proteins encoded by these genes, and how proteins interact with one another and the gene sequences themselves (e.g., DNA binding proteins such as transcription factors, histones, and the like). These efforts will undoubtedly produce voluminous amounts of valuable data in which the developing techniques in the emerging discipline of bioinformatics (data mining) will seek to organize and exploit. Indeed, the amount of data generated related to genomics research is ever accelerating as high throughput methods are refined in areas like gene expression assays, protein interaction assays, in vitro or cell-based assays used in drug development and a host of clinically related genetic tests.
Hybridization microarray technology has evolved to become an important tool in large scale genomics studies. Briefly, microarrays derive their name from the small (e.g., about 20–750 micron) size of the analysis sites typically arranged in a two-dimensional matrix of probe elements on the surface of a supporting substrate. The range of microarray samples is varied. In the majority of current applications, each probe element comprises numerous identical oligonucleotide (DNA) “probe” molecules. These probes are fixed to the substrate surface and may hybridize with complementary oligonucleotide “targets” from a sample. Typically, a label (e.g., fluorescent molecule) is either attached to the target prior to the hybridization step, or to the probe/target complex subsequent to hybridization. The microarrays are then observed for the presence of detectable labels (fluorescence imaging). The presence of a label in the area encompassing a particular probe element indicates that a sequence complementary to the characteristic sequence of that element was in the analyte.
Current microarray production techniques continue to evolve to permit larger arrays and the increasingly tight packaging of probe elements such that a single substrate array might allow the detection and quantation of 100,000 or more target sequences at once. A number of microarray data acquisition technologies and methodologies are known in the art, the purpose of each of which is to acquire a collection of data reflecting the pattern of hybridization on the microarray substrate. In order to achieve meaningful data, and discriminate individual array elements, current fluorescent imaging devices (e.g., confocal scanners) must be able to represent each microarray element with multiple pixels. Obviously, the analysis of such microarrays with current scanning devices generates large volumes of data. As an example, an array of 100,000 hybridization probes in the form of 25 μm squares would be represented by an image file of over 14 Megabytes if scanned with a confocal fluorescence scanner with a 2 mm pixel size. Manipulation of image data of this size represents a significant data processing overhead. A common output format from fluorescence imaging is 16-bit graphic (.tif) files. The 16-bit format provides a sensitivity range of from 0 to 65,536 incremental signal intensity steps per image pixel per microarray fluorescence wavelength detected. The image files obtained by current scanner technology must be further processed to correlate the data to particular sites on the array. Often, these algorithms require manual intervention to set discrimination parameters or to identify data features that correspond to probe locations. Such methods are further complicated when a high-density microarray must be scanned piecemeal, with individual portions of the image subsequently fitted together. For large-scale analysis, such methods require substantial computer memory storage. Furthermore, current microarray scanners are large, cumbersome, and expensive, making large-scale analysis time consuming, complex, and inefficient.
What is needed are systems and devices to more efficiently analyze microarrays. Preferably, such systems and devices minimize data storage requirements and minimize the costs and labor of working with microarrays.