DNA array technologies have made it possible to monitor the expression level of a large number of genetic transcripts at any one time (see, e.g., Schena et al., 1995, Science 270:467-470; Lockhart et al., 1996, Nature Biotechnology 14:1675-1680; Blanchard et al., 1996, Nature Biotechnology 14:1649; Ashby et al., U.S. Pat. No. 5,569,588, issued Oct. 29, 1996). Of the two main formats of DNA arrays, spotted cDNA arrays are prepared by depositing PCR products of cDNA fragments with sizes ranging from about 0.6 to 2.4 kb, from full length cDNAs, ESTs, etc., onto a suitable surface (see, e.g., DeRisi et al., 1996, Nature Genetics 14:457-460; Shalon et al., 1996, Genome Res. 6:689-645; Schena et al., 1995, Proc. Natl. Acad. Sci. U.S.A. 93:10539-11286; and Duggan et al., Nature Genetics Supplement 21:10-14). Alternatively, high-density oligonucleotide arrays containing thousands of oligonucleotides complementary to defined sequences, at defined locations on a surface are synthesized in situ on the surface by, for example, photolithographic techniques (see, e.g., Fodor et al., 1991, Science 251:767-773; Pease et al., 1994, Proc. Natl. Acad. Sci. U.S.A. 91:5022-5026; Lockhart et al., 1996, Nature Biotechnology 14:1675; McGall et al., 1996, Proc. Natl. Acad. Sci. U.S.A. 93:13555-13560; U.S. Pat. Nos. 5,578,832; 5,556,752; 5,510,270; and 6,040,138). Methods for generating arrays using inkjet technology for in situ oligonucleotide synthesis are also known in the art (see, e.g., Blanchard, International Patent Publication WO 98/41531, published Sep. 24, 1998, Blanchard et al., 1996, Biosensors and Bioelectronics 11:687-690; Blanchard, 1998, in Synthetic DNA Arrays in Genetic Engineering, Vol. 20, J. K. Setlow, Ed., Plenum Press, New York at pages 111-123). Efforts to further increase the information capacity of DNA arrays range from further reducing feature size on DNA arrays so as to further increase the number of probes in a given surface area to sensitivity- and specificity-based probe design and selection aimed at reducing the number of redundant probes needed for the detection of each target nucleic acid thereby increasing the number of target nucleic acids monitored without increasing probe density (see, e.g., Friend et al., U.S. patent application Ser. No. 09/364,751, filed on Jul. 30, 1999; and Friend et al., U.S. patent application Ser. No. 09/561,487, filed on Apr. 28, 2000).
By simultaneously monitoring tens of thousands of genes, DNA array technologies have allowed, inter alia, genome-wide analysis of mRNA expression in a cell or a cell type or any biological sample. Aided by sophisticated data management and analysis methodologies, the transcriptional state of a cell or cell type as well as changes of the transcriptional state in response to external perturbations, including but not limited to drug perturbations, can be characterized on the mRNA level (see, e.g., Stoughton et al., International Publication No. WO 00/39336, published Jul. 6, 2000; Friend et al., International Publication No. WO 00/24936, published May 4, 2000). Applications of such technologies include, for example, identification of genes which are up regulated or down regulated in various physiological states, particularly diseased states. Additional exemplary uses for DNA arrays include the analyses of members of signaling pathways, and the identification of targets for various drugs. See, e.g., Friend and Hartwell, International Publication No. WO 98/38329 (published Sep. 3, 1998); Stoughton, International Publication No. WO 99/66067 (published Dec. 23, 1999); Stoughton and Friend, International Publication No. WO 99/58708 (published Nov. 18, 1999); Friend and Stoughton, International Publication No. WO 99/59037 (published Nov. 18, 1999); Friend et al., U.S. patent application Ser. No. 09/334,328 (filed on Jun. 16, 1999).
The various characteristics of this analytic method make it particularly useful for directly comparing the abundance of mRNAs present in two cell types. For example, an array of cDNAs was hybridized with a green fluor-tagged representation of mRNAs extracted from a tumorigenic melanoma cell line (UACC-903) and a red fluor-tagged representation of mRNAs was extracted from a nontumorigenic derivative of the original cell line (UACC-903+6). Monochrome images of the fluorescent intensity observed for each of the fluors were then combined by placing each image in the appropriate color channel of a red-green-blue (RGB) image. In this composite image, one can see the differential expression of genes in the two cell lines. Intense red fluorescence at a spot indicates a high level of expression of that gene in the nontumorigenic cell line, with little expression of the same gene in the tumorigenic parent. Conversely, intense green fluorescence at a spot indicates high expression of that gene in the tumorigenic line, with little expression in the nontumorigenic daughter line. When both cell lines express a gene at similar levels, the observed array spot is yellow. Such a method is often termed “two-channel” measurement as compared to a method in which only one color labeling is measured.
Any quantitative measurement method, if affected by measurement errors, will have uncertainties in the measurement results. DNA microarray technology is not an exception. Differential expression ratios are typically derived from measured intensities in both single-channel and two-channel microarray technologies, so that it is essential to understand the intensity measurement errors. Measurement errors are often described by error models (see, e.g., Supplementary material to Roberts et al, 2000, Science, 287:873-880; and Rocke et al., 2001, J. Computational Biology 8:557-569). In a two-term error model, the first error source is a low-level additive noise of constant variance, which comes from the background of the array chip. This constant noise is independent from the hybridization levels of individual feature spots on a microarray. It may come from the combination of the scanner electronics noise and the chip surface fluorescence due to nonspecific binding. This constant additive noise is typically assumed normally distributed with a mean background. After background level subtraction, which is typically carried out during microarray data processing, the additive mean background becomes zero. The second error source is a multiplicative error that is the combined result of the speckle noise inherent in the coherent laser scanner and the fluorescent dye related noise. The multiplicative error is also called fractional error because its level is directly proportional to the measured intensity level. It is the dominant error source at high intensity levels. Sometimes an extra square-root term is also included to describe the effect of variation in number of available binding sites in a spot. This term is also called the Poisson term, because it is believed that the number of binding sites follows a Poisson distribution, and has a variance which is proportional to the average number of binding of sites.
Many microarray data processing and statistical analysis methods require the variance of the measurement error to be constant. In other words, the measurement variance should not be related to the measurement level over a measurement range. For example, in the commonly used analysis of variance (ANOVA) method, the variables under investigation must have a constant variance. In another example, many data regression and parametric or non-parametric modeling methods used in microarray data normalization and detrending to remove the intensity dependent non-linearity have the underlying assumptions that the data is not heteroskedastic (i.e., not having a changing variance). However, due to the multiplicative and Poisson terms, the measured microarray intensities do not meet the constant-variance requirement. To overcome thisproblem, measured intensities are often transformed to a new domain where the variance becomes a constant. All analysis and data processing are then carried out in the transformed domain. A logarithmic conversion is commonly used to transform multiplicative error to constant variance. But it does not work properly in low intensities where the original additive constant noise dominates. In a piecewise hybrid transformation method, a log transform is applied to high intensities and a linear transform is applied to low intensities. It has better error characteristics near the low intensity end than the simple logarithmic conversion. But the measurement variance of the hybrid-transformed intensity is still not close to a constant. The hybrid-transform can also significantly distort intensity distributions.
There is therefore a need for more efficient method that can be used to characterize measurement errors in measured signals. In particular, these is a need for methods that transform measured signals into a transformed domain which facilitate analysis of the signals and their errors. There is also a need for more efficient methods for analyzing measured signals as well as more efficient methods for processing measured signals, such as methods of obtaining difference of measured signals, methods of obtaining error-weighted averages, and methods of identifying and removing outliers.
Discussion or citation of a reference herein shall not be construed as an admission that such reference is prior art to the present invention.