DNA Microarrays are miniature arrays containing gene fragments that are either synthesized directly onto or spotted onto glass or other substrates. Thousands of genes may be represented in a single array. A typical gene expression microarray experiment involves the following steps: (1) Preparation of fluorescently labeled target from RNA isolated from the biological specimens; (2) Hybridization of the labeled target to the microarray; (3) Washing, staining, and scanning of the array; (4) Analysis of the scanned image; and (5) Generation of gene expression profiles (see FIG. 1).
Currently two main types of DNA microarrays are being widely used in biomedical applications: oligonucleotide (usually 25- to 70-mers) arrays and gene expression arrays containing PCR products prepared from cDNAs. In forming an array, oligonucleotides can be either prefabricated and spotted to the surface or directly synthesized on to the surface (in silico). For example, Affymetrix's popular GeneChip (Santa Clara, Calif.) are arrays fabricated by direct synthesis of oligonucleotides on the glass surface. Oligonucleotides, usually 25-mers, are directly synthesized onto a glass wafer by a combination of semiconductor-based photolithography and solid phase chemical synthesis technologies. Each array contains up to 900,000 different oligos and each oligo is present in millions of copies (FIG. 2). Since oligonucleotide probes are synthesized in known locations on the array, the hybridization patterns and signal intensities can be interpreted in terms of gene identity and relative expression levels.
Although in principle, microarray experiments have opened a gateway to vast amount of data at a relatively fast speed and low cost, in reality, there still remains a number of key technical issues that limits microarray experiments from reaching their full potential. In particular, because interpretation of microarray measurements requires computational methods to translate the digital images into corresponding concentration readings, the accuracy of an microarray experiment largely hinges on the capability of available image analysis methods to accurately translate image intensity values to true concentration values. Many factors are known to play a role in this translation process, including the physical construction of an array, the chemistry of the sample, and the optics of the array reader, just to name a few.
In general, measurement values obtained from each array will have a ‘block effect’ due to variation in RNA extraction, labeling, fluorescent detection, etc. Without statistical treatment, this block effect is confounded with real expression differentiation. The process of applying statistical treatment to reduce the block effect is defined as normalization. This process is usually done at the probe level. Several normalization methods for oligonucleotide arrays have been proposed and practiced. One approach uses lowess normalization to correct for non-central and nonlinear bias observed in M-A plots. Another class of approaches correct for the nonlinear bias seen in Q-Q plots. As Workman et al. and Bolstad et al. discussed in their papers (Workman et al. (2002), Genome Biol., 3, 1-16; Bolstad et al., (2003), Bioinformatics, 19, 185-193, the entire contents of both articles are incorporated herein by reference), several assumptions must hold in the methods using quantiles. First, most genes are not differentially regulated; second, the number of up-regulated genes roughly equals the number of down-regulated genes; third, the above two assumptions hold across the signal-intensity range. These three assumptions are adopted by most normalization methods in use today. For the purpose of discussion, these three assumption will herein be referred to as the standard assumptions.
Unfortunately, the standard assumptions do not always hold true. Thus, current crop of normalization methods are all quite restrictive in their capabilities. Without a reliable and generally applicable method of normalizing array data, the validity of results obtained from microarray experiments will always require further validation. Therefore, there is still a need in the art for a better normalization method that are generally applicable.
Accordingly, it is one object of the present invention to provide improved methods for analyzing microarray data that is capable of handling experimental conditions beyond the standard assumptions.
It is also an objective of the present invention to provide a better framework for processing microarray data that integrates normalization and summarization of data, thereby, improving the efficiency of microarray analysis.
These and other objects of the present invention are achieved through the methods and systems of the present invention. The principles and exemplary embodiments of the present invention will now be described in detail below.