Large-scale expression profiling has emerged as a leading technology in the systematic analysis of cell physiology. Expression profiling involves the hybridization of fluorescently labeled cDNA, prepared from cellular mRNA, to microarrays carrying up to 105 unique sequences. Several types of microarrays have been developed, but microarrays printed using pin transfer are among the most popular. Typically, a set of target DNA samples representing different genes are prepared by PCR and transferred to a coated slide to form a 2-D array of spots with a center-to-center distance (pitch) of about 200 μm. In the budding yeast S. cerevisiae, for example, an array carrying about 6200 genes provides a pan-genomic profile in an area of 3 cm2 or less. mRNA samples from experimental and control cells are copied into cDNA and labeled using different color fluors (the control is typically called green and the experiment red). Pools of labeled cDNAs are hybridized simultaneously to the microarray, and relative levels of mRNA for each gene determined by comparing red and green signal intensities. An elegant feature of this procedure is its ability to measure relative mRNA levels for many genes at once using relatively simple technology.
Computation is required to extract meaningful information from the large amounts of data generated by expression profiling. The development of bioinformatics tools and their application to the analysis of cellular pathways are topics of great interest. Several databases of transcriptional profiles are accessible on-line and proposals are pending for the development of large public repositories. However, relatively little attention has been paid to the computation required to obtain accurate intensity information from microarrays. The issue is important however, because microarray signals are weak and biologically interesting results are usually obtained through the analysis of outliers. Pixel-by-pixel information present in microarray images can be used in the formulation of metrics that assess the accuracy with which an array has been sampled. Because measurement errors can be high in microarrays, a statistical analysis of errors combined with well-established filtering algorithms are needed to improve the reliability of databases containing information from multiple expression experiments.
The foregoing and other features and advantages of the invention will become more apparent upon reading the following detailed description and upon reference to the accompanying drawings.