Gene Expression Analysis—
Fundamental biological processes such as cell cycle progression, cell differentiation and cell death are associated with variations in gene expression patterns which therefore provide a means of monitoring these processes on a molecular level. Gene expression patterns can be affected by exposure to therapeutic agents, and they are thus useful molecular indicators of efficacy of new drugs and validation of drug targets. At present, gene expression analysis plays an increasingly important role in connection with target discovery.
Gene expression analysis also offers a systematic molecular approach to the analysis of multigenic traits. In the context of plant molecular biology and molecular agriculture, expression patterns of designated genes and their temporal evolution are finding increasing application to guide “breeding” of desirable properties such as the rate of growth or ripening of fruits or vegetables.
Changes in expression levels also are indicators of the status and progression of pathogenesis. Thus, the under-expression of functional tumor suppressor genes and/or over-expression of oncogenes or protooncogenes is known to be associated with the presence and progression of various cancers. Specific genes have been identified whose expression patterns undergo characteristic variations in the early stages of immune response to inflammation or exposure to pathogenic agents including common viruses such as HSV or CMV as well as biochemical warfare agents such as anthrax. Contrary to the expression of protein markers such as antibodies, gene expression occurs at the earliest stages of immune response, thereby offering the possibility of early and specific therapeutic intervention.
Accordingly, the rapid quantitative analysis of expression levels of specific genes (“messages”) and their evolution in time following exposure to infectious agents—or following treatment—holds significant promise as a tool to advance the molecular diagnosis of disease. However, as elaborated in the present invention, standard methods of quantitative gene expression analysis produce data of uncertain quality. Further, as a reliable and practical tool of molecular diagnostics, gene expression analysis, and specifically multiplexed expression monitoring (herein also referred to in abbreviation as “mEM”), must be simple in protocol, quick to complete, flexible in accommodating selected sets of genes, reliable in controlling cross-reactivity and ensuring specificity, capable of attaining requisite levels of sensitivity while performing quantitative determinations of message abundance over a dynamic range of three to four orders of magnitude and convenient to use.
These attributes generally do not apply to current methods. That is, while gene expression analysis has become a standard methodology of target discovery, its use as a diagnostic methodology, particularly in expression monitoring, requiring the quantitative determination of cDNA levels in the target mixture as a measure of the levels of expression of the corresponding mRNAs, has been limited by the lack of flexible and reliable assay designs ensuring rapid, reliable and quantitative multiplexed molecular diagnosis.
Spatially Encoded Arrays: In-Situ Synthesis and “Spotting”—
The practical utility of gene expression analysis is greatly enhanced when it is implemented using parallel assay formats that permit the concurrent (“multiplexed”) analysis of multiple analytes in a single reaction. In a commonly practiced format (see, e.g., U. Maskos, E. M. Southern, Nucleic Acids Res. 20, 1679-1684 (1992); S. P. A. Fodor, et al., Science 251, 767-773 (1991)), the determination of gene expression levels is performed by providing an array of oligonucleotide capture probes—or, in some cases, cDNA molecules—disposed on a planar substrate, and contacting the array—under specific conditions permitting formation of probe-target complexes—with a solution containing nucleic acid samples of interest; these can include mRNAs extracted from a particular tissue, or cDNAs produced from the mRNAs by reverse transcription (RT). Following completion of the step of complex formation (“hybridization”), unbound target molecules are removed, and intensities are recorded from each position within the array, these intensities reflecting the amount of captured target. The intensity pattern is analyzed to obtain information regarding the abundance of mRNAs expressed in the sample. This “multiplexed” assay format is gaining increasing acceptance in the analysis of nucleic acids as well as proteins in molecular medicine and biomedical research.
Lack of Flexibility, Reproducibility and Reliability—
However, spatially encoded probe arrays generally are not well suited to quantitative expression analysis of designated sets of genes. Thus, in-situ photochemical oligonucleotide synthesis does not provide a flexible, open design format given the time and cost involved in customizing arrays. As a result, “spotted”, or printed arrays, which provide flexibility in the selection of probes, have been preferred in applications requiring the use of only a limited gene set. However, “spotting” continues to face substantial technical challenges akin to those encountered by the standard “strip” assay format of clinical diagnostics, which generally is unsuitable for quantitative analysis. Poor reproducibility, relating to the non-uniformity of coverage, and uncertain configuration and accessibility of immobilized probes within individual spots, remains a significant concern. In addition, these arrays require expensive confocal laser scanning instrumentation to suppress substantial “background” intensities, and further require statistical analysis even at the early stages of subsequent data processing to account for non-uniform probe coverage and heterogeneity. Another concern is the comparatively large footprint of spotted arrays and the correspondingly large quantities of reagent consumed. Finally, scale-up of production to levels required for large-scale diagnostic use will be complex and economically unfavorable compared to batch processes such as those available for the preferred embodiment of the present invention in the form of planar arrays of encoded microparticles.
In addition to limited sensitivity, other problems with array-based diagnostics include limited ability to detect genes expressed in widely varying copy number (from 1 or 2 copies per cell to ˜104 copies per cell). Thus, what is needed is an assay method which avoids these problems by maximizing detection sensitivity, minimizing cross-reactivity and permitting detection over a wide dynamic range of transcript copies.
Lack of Specificity—
The most prevalent methods of the prior art rely on multiplexed probe-target hybridization as the single step of quantitative determination of, and discrimination between multiple target sequences. Hybridization is sometimes lacking in specificity in a multiplexed format of analysis (see discussion in U.S. application Ser. No. 10/271,602, entitled: “Multiplexed Analysis of Polymorphic Loci by Concurrent Interrogation and Enzyme-Mediated Detection,” filed Oct. 15, 2002). To enhance specificity, some formats of multiplexed hybridization employ long probes in spotted arrays, e.g. Agilent EP 1207209 discloses probes of preferred length 10 to 30, and preferably about 25. These may help to offset the random obstruction and limited accessibility of capture sequences in spotted probes. That is, probe-target complex formation in spotted arrays generally will not involve the full length, but rather randomly accessible subsequences of the probe. However, as disclosed herein, the use of long probes in a solid phase format generally will be counterproductive. Furthermore, the lack of specificity remains a source of concern: as shown herein, cross-hybridization generally will distort intensity patterns, thereby precluding quantitative analysis unless careful primer and probe designs are employed, using, for example the methods of a co-pending application (U.S. application Ser. No. 10/892,514, “Concurrent Optimization in Selection of Primer and Capture Probe Sets for Nucleic Acid Analysis,” filed Jul. 15, 2004) and performing careful analysis taking into account the molecular interactions between non-cognate probes and targets.
Differential Gene Expression (“Transcript Profiling”)—
Given these difficulties of standard methods of the art, and the potential for serious uncertainty and error in the quantitative determination of absolute expression levels, the format usually preferred in practice is differential expression analysis. This format characterizes differences in expression patterns between normal tissue or cells vs diseased or otherwise altered tissue or cells, or differences between normal (“wild-type”) vs transgenic plants. In accordance with a commonly practiced approach, a set of cDNA clones is “spotted” onto a planar substrate to form the probe array which is then contacted with DNA from normal and altered sources. DNA from the two sources is differentially labeled to permit the recording of patterns formed by probe-target hybridization in two color channels and thus permitting the determination of expression ratios in normal and altered samples (see, e.g., U.S. Pat. No. 6,110,426 (Stanford University)). The system of two-color fluorescent detection is cumbersome, requiring careful calibration of the laser scanning instrumentation generally required to read spotted or other spatially encoded probe arrays—and as well as separate scans for each of the two color channels. These disadvantages are overcome by the subtractive method of differential gene expression disclosed herein which requires only a single detection color.
Complex Protocols—
In a commonly practiced approach to multiplexed expression profiling, mRNA molecules in a sample of interest are first reverse transcribed to produce corresponding cDNAs and are then placed in contact with an array of oligonucleotide capture probes formed by spotting or by in-situ synthesis. Lockhart et al. (U.S. Pat. No. 6,410,229) invoke a complex protocol to produce cRNA wherein mRNA is reverse transcribed to cDNA, which is in turn transcribed to cRNA under heavy labeling—of one in eight dNTPs on average—and detected on an array of synthesized oligonucleotide probes using a secondary “decoration” step. Such a laborious, error-prone and expensive process not only greatly increases the complexity of the method but greatly contributes to the uncertainty of final determinations of message abundance, for example by producing non-linear amplification.
A preferred method of the prior art for multiplexed expression analysis is the use either of randomly placed short reverse transcription (RT) primers to convert a set of RNAs into a heterogeneous population of cDNAs or the use of a universal RT primer directed against the polyA tail of the mRNA to produce full-length cDNAs. While these methods obviate the need for design of sequence-specific RT primers, both have significant disadvantages in quantitative expression monitoring.
Randomly placed RT primers will produce a representative population of cDNAs, that is, one in which each cDNA is represented with equal frequency, only in the limit of infinitely long mRNA molecules. The analysis of a designated set of short mRNAs by random priming generally will produce cDNAs of widely varying lengths for each type of mRNA in the mixture, and this in turn will introduce potentially significant bias in the quantitative determination of cDNA concentration, given that short cDNAs will more readily anneal to immobilized capture probes than will long cDNAs, as elaborated in the present invention. Further, the production of full-length cDNAs, if in fact full-length RT is successful, provides a large sequence space for potential cross-reactivity between probes and primers, making the results inherently difficult to interpret and hence unreliable.
The Role of Target and Probe Configurations—
DNA in solution has been shown to display the characteristics of polymers governed by chain entropy (see Larson et al., “Hydrodynamics of a DNA molecule in a flow field,” Physical Review E 55:1794-97 (1997)). Especially single-stranded (ss) DNA is quite flexible, a fact which manifests itself in a short persistence length of the order of only a few nucleotides (nt) under most experimentally relevant conditions, considerably smaller than that of double stranded DNA (Marko J F, Siggia E D, “Fluctuations and supercoiling of DNA,” 22:265, 506-508 (1994)). Capture of ssDNA to immobilized probes thus involves considerable restriction of the molecules' conformational freedom. At the same time if duplex formation is to occur, immobilized probes used in solid phase formats of nucleic acid analysis must accommodate invading target strands by elastic deformation. Conformational adjustments in target and probe molecules, considered as polymers, heretofore have not been appreciated in designing assays for nucleic acid analysis.
In view of the foregoing considerations, it will be desirable to have flexible, rapid, sensitive and specific methods, compositions and assay protocols particularly for diagnostic applications of gene expression analysis—herein also referred to as multiplexed expression monitoring (mEM). The present invention discloses such methods and compositions, specifically methods and compositions for rapid, customizable, multiplexed assay designs and protocols for multiplexed expression monitoring, preferably implemented in the format of random encoded array detection for multianalyte molecular analysis. A co-pending application discloses methods by which to select optimized sets of desirable conversion probes (e.g. RT primers) and detection probes (e.g., probes for hybridization-mediated target capture) to further enhance the level of reliability (see U.S. application Ser. No. 10/892,514 “Concurrent Optimization in Selection of Primer and Capture Probe Sets for Nucleic Acid Analysis,” filed Jul. 15, 2004).