The use of arrays to simultaneously quantify a large number of nucleic acid targets in a single experimental sample is an increasingly popular method. There are two areas where this method is most widely used. First is the generation of a mRNA profile to examine effects of different conditions (genetic or environmental) on mRNA expression. Second is the generation of a gene dosage profile to examine the presence of amplifications or deletions of portions of genomic DNA (comparative genome hybridization, CGH).
In the first area, labeled copies (cDNA) have been made from mRNA templates by reverse transcription, or less commonly the mRNA itself has been directly labeled. For examples of the latter, psoralen-biotin (Kumar et al., 2002 Nature Biotechnology 20; 58-63; incorporated by reference herein) and a ligation reaction (Kampa et al., 2002 Genome Research 14; 331-342; incorporated by reference herein) have been used to label purified poly A RNA in studies where there were concerns about avoiding potential artifacts caused by copying reactions.
In the second area involving CGH studies, labeled genomic DNA has been prepared through a nick translation or random primer reaction. An alternative method has been to directly label the genomic DNA itself with chemical reagents. Signals from a test sample can be compared to a standard to indicate the presence of increases or decreases in either genetic representation (CGH) or expression (RNA profiling) of various nucleic acid sequences. The standard can either be done simultaneously or in parallel with the test sample or the standard can even comprise prior or archived data. In many cases, the standard will be a control sample: cells growing under “normal” conditions vs some environmental factor or it can be a transformed cell versus an untransformed cell. In other cases, the standard is of an arbitrary nature, such as in the case, for example, where kidney cell expression is measured and compared to liver cell expression as a reference standard, thereby identifying genes that have differential expression in kidneys versys liver. In another example, lung cancer can be compared to normal lung cells and breast cancer cells and the latter two can serve as reference standards.
In either RNA profiling or CGH array applications, hybridization of the labeled products then takes place with complementary nucleic acids located at various sites on the array followed by quantification of the amount of signal strength at each location. The strands on each site of the array can be single strands comprising synthetic oligonucleotides or polynucleotides that represent a selected portion of the nucleic acid sequence of interest (a monophasic array), or the strands may be derived from denatured double-stranded sources such as bacterial artificial chromosomes (BACs), plasmids or PCR products (biphasic arrays). In the latter case, when labeled mRNA or cDNA were used as probes for mRNA profiling, only one strand has usually served as a target even though both strands are present at each site on the biphasic array.
There are numerous situations, however, where the sample size is insufficient to produce effective amounts of signals on an array and the amount of nucleic acids in the sample needs to be amplified. In the first method that was designed for global amplification of mRNA (and described as the Eberwine process by Van Gelder et al. (1990, Proc. Natl. Acad. Sci. USA 87; 1663-1667, incorporated by reference herein)), a primer with a T7 promoter attached to an oligo-T segment was used to prepare cDNA copies by extension from the poly A region of mRNA to generate a hybrid molecule with the cDNA bound to its complementary mRNA template. In a subsequent step, the method of Gubler and Hoffman (1983 Gene 25; 263-269, incorporated by reference herein) was used to allow portions of the original template mRNA to be used as primers, thereby transforming the original first strand cDNA copies into double-stranded DNA constructs. Because the T7 promoter sequence was included in the original oligo-T primer, the second strand synthesis step converts this primer segment into a functional double stranded-promoter that can be used in a transcription reaction for synthesis of a large number of RNA copies from each DNA template. Unlike the original mRNA which had a poly A segment at the 3′ end, the RNA copies made by this amplification method have a poly T sequence at the 5′ end, i.e., the RNA copies are the opposite strand of the original mRNA and are sometimes referred to as aRNA. As described previously for labeled cDNA, the labeled aRNA created from the Eberwine process has been used with arrays that have either both strands present (a biphasic array) or have targets with the original mRNA sequences (a monophasic array).
Although this orientation for constructs to make a labeled RNA library is the most common, other methods have been described where a bacteriophage promoter is incorporated into the other end, i.e., the transcription takes place in nucleic acid constructs from the end of the nucleic acid constructs that was derived from the original 5′ end of mRNAs, thereby generating sense RNA that is essentially similar to the original starting mRNA. Instead of using the endogenous mRNA templates as a source of primers as described by Eberwine et al., this other method uses an exogenous primer for second strand synthesis. As such, instead of having the promoter in the oligo-T primer, the promoter can now be included in the sequence of the primer for second strand synthesis, thereby reversing the direction of transcription. For examples of various means that have been described for producing a library of sense RNA as an amplified product, see U.S. Patent Application No. 20040161741; Goff et al., 2004 BMC Genomics 5; 76-84; and Marko et al., 2005 BMC Genomics 6; 27-39; the contents of all of which are incorporated by reference.
Labeling of this sense RNA does not produce, however, a product compatible with monophasic arrays that are exclusively designed to hybridize with labeled anti-sense nucleic acids. As such, it was suggested in the aforementioned U.S. Application No. 20040161741 that instead of using monophasic arrays that were complementary to antisense RNA products, the array could be designed for sequences complementary to the original sense mRNA. On the other hand, arrays designed for use with antisense RNA products have been used with amplification processes that generate sense oriented strands by the simple expedient of applying the same solution that was originally used with the unamplified mRNA, i.e., the sense amplification product was used as a template for synthesizing labeled cDNA. It should be pointed out that this reverse transcription step is not necessary when a biphasic array is used that has both strands present at each site or when a small number of commercially available arrays that have oligonucleotide targets from one strand at some locations and targets from the other strand in other locations (Checklt arrays for example, available from Telechem International, Inc. Sunnyvale, Calif., product literature incorporated by reference herein). In these cases, some of the targets on the arrays are compatible with either labeled sense or anti-sense products.
It should also be pointed out that a monophasic array synthesized with oligonucleotides in the anti-sense orientation has recently become commercially available (the Human Exon 1.0 ST Array from Affymetrix, inc. Santa Clara, Calif., product literature incorporated by reference herein). This array was designed by taking exon and EST sequences and using them to design complementary sequences for the array. Kits that have been designed to generate label these are either designed to produce sense strand cDNA products (WT cDNA synthesis and amplification kit, Affymetrix, Santa Clara, Calif.; product literature incorporated by reference) or designed to synthesize both labeled sense and antisense products.
When carrying out RNA profiling studies, the limiting amount of nucleic acids in a sample is not the only concern. First, when carrying out studies on transformed cell lines or tumors, there will often be sufficient material for direct methods of CGH analysis to identify amplifications or deletions of chromosomal content. On the other hand, other specimens may be very small (biopsies or microdissected material) or of low quality (archival biopsy specimens). Second, for the purposes of prognostic diagnosis of cancer, it is often critical to identify chromosomal aberrations prior to there being a significant physical appearance in a tumor. For instance, gross level changes in copy number of the human telomerase gene have been identified in Pap smears by comparative FISH analysis and correlated with predictions of development of cervical carcinoma (Heselmeyer-Haddad et al, 2005 Am J Path 166; 1229-1238, incorporated by reference herein). For the above reasons, numerous methods have been described in the literature for general amplification of chromosomal DNA sequences. For a review of a number of systems used for this purpose see Hughes et al., 2004 Progress in Biophysics and Molecular Biology 88; 173-189, the contents of which are incorporated by reference.
It is easily understood that when doing CGH studies, both strands are present in equal amounts. RNA profiling studies are often carried out, however, on a basic assumption of asymmetry, i.e., when the activity of a particular gene is being studied by means of an oligonucleotide array, it is sufficient to have sequences present from only one strand. What is sometimes if not often overlooked is that transcription is not completely relegated to one strand, even when a single gene is considered. A well-recognized natural phenomenon termed anti-sense regulation takes place in cells where transcription of sequences that are complementary to protein coding mRNA is used by cells to regulate the amount of gene products that are made from the mRNA transcripts. Recent studies that have involved more precise measurement of the extent of sense and anti-sense sequences being transcribed from the same gene have shown that it is possible that more than twenty percent (20%) of transcribed genes have anti-sense counterparts (Chen et al., 2004 Nucl. Acids Res. 32; 4812-4820, incorporated by reference herein).
For studies where both sense and anti-sense poly A mRNA are amplified in an asymmetric manner, the product will still consist of both [+] and [−] strands. For instance, when the Eberwine procedure is used, the mRNA transcript in a sample will generate complementary aRNA strands. In a similar fashion, anti-sense transcripts with polyA ends that may also be present in the sample will likewise be amplified and the complementary strands generated from these templates will comprise sense mRNA sequences. In studies that have used monophasic oligonucleotide arrays for RNA profiling, this duality has been for the most part ignored since only the labeled aRNA amplification products generated signals by hybridizing to the mRNA derived sequences on the array. Expression of anti-sense poly A sequences was not measured in such experiments due to a lack of complementary sequences on the arrays and only changes in mRNA transcription were recognized in these studies. On the other hand, separate assessments for amplified mRNA products and antisense RNA products can be achieved by providing arrays with oligonucleotides that are complementary to each orientation. The foregoing analytical techniques have been used for labeled unamplified RNA samples (Kumar et al., 2002 Nature Biotechnology 20; 58-63; Kampa et al., 2002 Genome Research 14; 331-342, both of which are incorporated by reference).
Even arrays that comprise a single orientation may be confounded by the presence of both sense and antisense sequences in a biological sample when methods of amplification are used that are symmetric in nature. An example of this is the SMART PCR method (Clontech, Mountain View, Calif., product literature incorporated by reference herein), where both mRNA and anti-sense transcripts serve as templates for PCR amplification as long as they have poly A tails. Labeled products made by this process will hybridize to a monophasic array regardless of whether the original template was a sense mRNA or an anti-sense transcript. Under these conditions, this process was not aimed at measuring mRNA transcription levels per se, but rather in measuring the overall gene activity where contributions from both sense and anti-sense transcripts in a sample contribute to the ultimate signal. Similarly, in array systems where both strands are present at each site of the array, signals are generated not only by amplification products of mRNA transcript templates, but also by the amplification products from antisense transcript templates regardless of whether an asymmetric or symmetric amplification process is used. In essence, these arrays also provide a measurement of an overall gene activity without distinguishing whether the signal is derived from copying either sense or anti-sense transcript templates.
When there are broad changes in species that are represented in high numbers in a sample, effects are easily ascertained. Only a certain percentage of the population is, however, sufficiently represented such that the products are capable of generating a detectable signal, i.e., targets that may be present in small numbers cannot be reliably detected above the background levels of the array. The number of targets that can be detected as compared to the number of potential targets is frequently referred to as the “call rate.” As one gets closer to background levels, random fluctuations in signal strength become more problematic, even with detectable signals. Furthermore, when a promoter dependent amplification method is used, there are biases involved in having the promoter initiate transcription from sequences that were located at the 3′ end or the 5′ end of the original mRNA. Thus, when a promoter transcribes from the region originally derived from the 3′ end, there is a higher representation of sequences from the 3′ region compared to the 5′ end in many gene products. The converse also holds true when sequences from the 5′ region are used for the start of transcription. Thus there remains a critical need for methods that can increase the reliability of data generated from arrays as well as for methods that can increase the sensitivity of detection of fluctuations in copy numbers of low level target nucleic acids.