The present invention relates to an improved method for producing amplified heterogeneous populations of cDNA from limited quantities of RNA or other nucleic acids.
Selective amplification of cDNAs represents a major research goal for molecular biology, with particular importance in diagnostic and forensic applications, as well as in general manipulation of genetic material.
In many important areas of research, such as in studying gene regulation in complex biological systems (e.g., the brain) having multiple phenotypes, the obtaining of an sufficient amount of RNA for isolating, cloning, and characterizing of specific regulated transcripts is problematic. Research has been hindered by, e.g., the high complexity of the mRNA, the relatively low abundance of many expressed messages, and the spatially limited expression of these messages. In particular, the isolation of sufficient RNA for micro-array analysis has been a challenge. Various labeling techniques have been developed for that purpose. These technologies can be divided into PCR and non-PCR based labeling technologies. Two of the non-PCR based methods are from NEN life science and Genisphere respectively and are based on a principle in which the detectable signal is amplified after the final hybridization.
For instance, NEN life science has developed a technology which is based on their Tyramide Signal Amplification(trademark) (TSA) system, U.S. Pat. No. 5,196,306, which was originally developed for immunohistochemistry but has recently been adapted for micro-array analysis. Furthermore, the company Genisphere has developed their so called xe2x80x9cDendrimer technologyxe2x80x9d which is based on a labeled complex DNA structure which hybridizes to a target sequence. Both of these technologies have very complex protocols which require additional steps after the final hybridization of the probe to the target sequence.
Affymetrix has developed a method where the RNA is amplified prior to the labelling of the probe. This is accomplished by including a T7 promoter region into the oligo(dT) primer and using T7 RNA polymerase to generate multiple RNA copies by reverse transcribing the double stranded cDNA, U.S. Pat. No. 5,545,522.
In order to simplify the method and to be able to detect RNA at such low levels as in the sub-microgram range, the use of PCR in the amplification of the probe signal is needed. The polymerase chain reaction (PCR) is an extremely powerful technique for amplifying a specific nucleic acid. The use of PCR in gel based gene profiling technologies was introduced several years ago with the invention of the Differential Display technologies and their like (Liang and Pardee, 1992). Recently, a differential display approach has been developed to synthesize very sensitive micro-array probes (Trenkle et al., 1999). This method, however, is limited as it only amplifies about 5% of the RNA population at a time and furthermore is susceptible to the same problems of reproducibility as seen in the differential display method.
One of the demands of PCR is typically that the 5xe2x80x2 terminus and 3xe2x80x2 terminus sequence information is known for the synthesis of the primers. Homopolymeric tailing of the 3xe2x80x2 terminus (Frohman et al., 1988) and the synthesis of highly degenerate nucleotide primers (Gould et al., 1989) have been implemented to improve the range of cDNAs that can be amplified and cloned with PCR.
A number of techniques have been developed to add a sequence tag to the 5xe2x80x2-end of single stranded cDNA. For instance, the so called xe2x80x98Ligation-anchored PCRxe2x80x99 (Troutt et al., 1992) was developed in 1992 for that purpose. It makes use of T4 RNA ligase for the ligation of a single stranded oligo to the 5xe2x80x2-end of a single stranded cDNA template. The use of T4 RNA ligase in ligation of single stranded templates has however never been widely used due to the low efficiency of single stranded ligations compared to ligations using e.g. T4 DNA ligase. In another example the widely used PCR synthesis kit (Clontech laboratories) is adapted from a technique developed by Chenchik et al. (U.S. Pat. No. 5,962,277). This technique makes use of the terminal transferase activity of the MMLV reverse transcriptase. By use of a special oligonucleotide it is possible for the reverse transcriptase to switch template and extend the first strand synthesis into a specific sequence. However, only full length cDNAs will be tailed with the specific sequence anchor and only sequences which have been tailed by terminal transferase activity of the reverse transcriptase will be extended. In a more recent article, the ligation of a fluorescent labeled primer toward an unlabeled 5xe2x80x2-phosporylated primer was efficiently carried out using T4 DNA ligase and a so called bridging primer (Jang and Steffens, 1997). This method describes the ligation of small fluorescently labeled oligonucleotides with known sequences but is not applicable for adding sequence tags to the 5xe2x80x2-end of e.g. cDNAs of unknown sequences.
In spite of such recent advances, including PCR and its various modifications noted above, there is a need for irmproved methods for amplifying RNA for cloning and micro-array experiments. Especially a method is sought for that is simple and reproducible and that is able to amplify limited amount of cDNA starting from heterogeneous populations of RNA.
The present invention provides novel processes for nucleic acid amplification, especially suitable for amplification of low abundant cDNA originating from a source only containing the mRNA of interest in a very limited amount.
The present invention describes a method in which a specially designed adaptor with a non-specific overhang can be efficiently ligated directly to a single stranded cDNA and amplified directly in a subsequent PCR reaction. This method is especially suitable for generating labeled probes to be used in micro array hybridization experiments when only limited amounts of RNA is available.
The overall methodologies will be capable of amplifying a broad range of messenger RNAs without prior cloning into vectors and in some instances without knowledge of the sequence. This is achieved by performing a first strand synthesis of cDNA from mRNA, using a cDNA synthesis primer containing a polythymidylate region and an anchor region, said anchor region having a pre-defined nucleotide sequence (COM1) complementary to an amplification primer (primer #2). The pre-defined nucleotide sequence can be any desired nucleotide sequence, e.g., gene of interest or portion thereof of interest or gene fragment of interest.
Hereafter, a specially designed adaptor fragment is ligated to the first strand cDNA. Said adaptor is easily ligated to the cDNA via a non-specific overhang comprising non selective bases like deoxyinosines, which will keep the annealing temperature of the adaptor constant and also ensure a high equilibrium toward the specific single cDNA during the ligation. Furthermore, the adaptor contains a pre-defined nucleotide sequence (COM3) complementary to an amplification primer (primer #1). The single stranded cDNA product with ligated adaptor can hereafter be subjected to standard nucleic acid amplification procedures, e.g. PCR, using two primers (primer #1 and primer #2) which are preferably single stranded nucleotides of sufficient length to act on the cDNA template for synthesis of extension products under suitable conditions.
This method of nucleic acid amplification can for example be applied in a process for detecting expression of a gene in a pre-selected cell population wherein mRNA from said cell population is amplified according to the invention thereby determining the presence or absence of mRNA corresponding to the gene of interest or portion thereof of interest or gene fragment of interest. The cell population may e.g. be from a human tissue samples, such as from brain tissue. The cell population may e.g., be from an embryonic or fetal tissue. The cell population may be only a single cell, or be comprised by up to 100 to 1,000,000 cells or even more as desired. The cell(s) can be from any desired source.
In another embodiment, the present invention comprises a process for producing a subtractive hybridization probe.
Additionally, the present invention comprises methods for making cDNA libraries from a collection of mRNA molecules.
Importantly, the present invention can be readily provided in kit form for a variety of uses. In addition to instructions, a kit will typically comprise containers of reverse transcriptase, RNA polymerase, and nucleotides which may be labeled, such as with radioactive labels (e.g. 14C, 3H, 32P, Cy3, Cy5, 33P, 35S, 125I, fluorophores, fluorescein, rhodamin and Texas Red, and the like).
In the present invention, a method is provided for the amplification of broad classes of cDNAs. This method involves the ligation of an adaptor into the 3xe2x80x2-end of a single stranded first strand cDNA molecule and subsequent nucleic acid amplification as shown in FIGS. 1 and 2 and described in detail in the steps below:
a) annealing a cDNA synthesis primer of the general formula I
5xe2x80x2-COM1TnVn1-3xe2x80x2xe2x80x83xe2x80x83I
to an RNA molecule and synthesizing a first cDNA strand to form an RNA-cDNA intermediate, COM1 being a pre-selected nucleotide sequence is larger than or equal to 0, preferably between 0 and 40 nucleotides long, n and no are integers each characterized as 0xe2x89xa6n and n1=0 or n1=1
b) separating the cDNA strand from the RNA,
c) purifying the cDNA by removing the cDNA synthesis primer,
d) contacting said cDNA with an adaptor, said adaptor consisting of two oligonucleotides hybridized to each other having the general formula II and III respectively
5xe2x80x2-PO-COM2-COM3-X-3xe2x80x2xe2x80x83xe2x80x83II
5xe2x80x2-COM4-(N)n2-Zm-X-3xe2x80x2xe2x80x83xe2x80x83III
e) ligating the adaptor via the 5xe2x80x2-phosphate group on strand II of the adaptor to the single stranded cDNA using a DNA ligase,
f) amplifying said ligated single stranded cDNA fragment obtained in step e) in a molecular amplification procedure so as to obtain amplified cDNA fragments, wherein is used at least one set of amplification primers being partly or fully complementary to the general formula II and the pre-selected nucleotide sequence (COM1) of the cDNA synthesis primer respectively.
The COM2 and COM4 sequences of formula II and III are complementary to each other and are therefore able to form the following complex: 
In formulas I, II and III COM1, COM2, COM3 and COM4 represent predefined nucleotide sequences and the number of nucleotides of the COM-sequences is an integer. COM1 is larger than or equal to 0 (0xe2x89xa6COM1), e.g., at least 1 or 3, preferably between 0 or 1 or 3 and 40 nucleotides (0xe2x89xa6COM1xe2x89xa640 or 1xe2x89xa6COM1xe2x89xa640 or 3xe2x89xa6COM1xe2x89xa640), COM2 and COM4 are of equal length and are larger than or equal to 4 (4xe2x89xa6COM2, 4xe2x89xa6COM4), preferably between 6 and 25 nucleotides (6xe2x89xa6COM2xe2x89xa625, 6xe2x89xa6COM4xe2x89xa625), COM3 is larger than or equal to 0 (0xe2x89xa6COM3), preferably between 0 and 40 nucleotides (0xe2x89xa6COM3xe2x89xa640). T is the nucleotide thymidine; V is a nucleotide selected from the group consisting of A, G and C; PO designates a phosphate group; N is any nucleotide, Z is selected from the group consisting of the natural analogues deoxyinosine and deoxyuridine or the synthetic analogues 1-(2xe2x80x2-deoxy-beta-D-ribofuranosyl)-5-nitroindole, 1-(2xe2x80x2-deoxy-beta-D-ribofuranosyl)-3-nitropyrrole or 1-(2xe2x80x2-deoxy-beta-D-ribofuranosyl)-imidazole-4-carboxamide; X is an extension protection group. n2 and m are integers and n2 describes the numbers of N and is larger than or equal to 0 or smaller than or equal to 10 (0xe2x89xa6n2xe2x89xa610), preferably 0xe2x89xa6n2xe2x89xa68 and m is larger than or equal to 0 or smaller than or equal to 25 (0xe2x89xa6mxe2x89xa625) with the proviso that n2 and m together are larger than or equal to 4 or smaller than or equal to 25 (4xe2x89xa6n2+mxe2x89xa625). A probe or primer can be any stretch of at least 7 or 8, preferably at least 10, more preferably at least 12, 13,14, or 15, such as at least 20, e.g., at least 23 or 25, for instance at least 27 or 30 nucleotides; for instance, between 15 and 25 nucleotides. As to PCR or hybridization primers or probes and optimal lengths therefor, reference is also made to Kajimura et al., GATA 7(4):71-79 (1990).
These unique features of the adaptor ensures that the ligation of the adaptor to a single stranded cDNA will mimic a double stranded ligation reaction. Thus the efficient use of a DNA ligase like T4 DNA ligase can be used in the ligation reaction opposed to the single stranded RNA ligation reaction used in the xe2x80x98Ligation-anchored PCRxe2x80x99 (Troutt et al., 1992). The double stranded overlap will be n2+m bp long and should preferentially be at least 4 bp to mimic a double stranded ligation reaction as also described in a recent paper (Jang and Steffens, 1997).
The adaptors developed by Jang et al. were used only to join small oligo nucleotide sequences and were limited to only six base pairs as they where using degenerated base pairs (N) and not non-selective bases like deoxyinosines. Furthermore, these adaptors were constructed without extension protection groups which probably make them unsuitable for PCR reactions due to unspecific ligation events and the possible elongation of the adaptor sequences during the PCR.
The PCR amplification, which will typically yield at least about 20-40 fold amplification, such as typically about 50 to 100 or 250-fold amplification, but may even be 500 to 1000-fold or higher amplification, can be achieved from as little as some nanograms of cDNA, and is simple to perform under standard molecular biology laboratory conditions Sambrook et al., 1989, Molecular Cloning: A laboratory Manual.
In particular when the cDNA is amplified in the presence of labeled nucleotides, it can be used a sensitive probe in micro array hybridization experiments. The label may be present on any of the primers, the adaptor or the nucleotides used in the amplification process and the label may be selected from the group consisting of fluorophores and radioactive isotopes, such as fluorescein, Cy3, Cy5, rhodamin and Texas Red; or 32P, 33P, 35S, 3H, 125I and 14C.
In one general embodiment of the present invention, cDNA strands are synthesized from a collection of mRNAs using a cDNA synthesis primer of the general formula I
5xe2x80x2-COM1TnVn1-3xe2x80x2xe2x80x83xe2x80x83I
Said cDNA synthesis primer is an oligonucleotide primer comprised by a primer region (TnVn1) and an anchor region (COM1). If the target mRNA is the entire mRNA population, then the primer region can be a polythymidylate region, e.g., about 7 to 30, preferably about 15 to 20 thymidine (T) residues, i.e. 7xe2x89xa6nxe2x89xa630 and n1=0 or n1=1. When n1=0 the primer will anneal arbitrarily to the poly-A tail while in the case where n1=1 the primer will preferentially anneal at the junction in the mRNA where the poly-A begins.
The cDNA synthesis primer complex will thus hybridize to the poly(A) tail present on the 3xe2x80x2 terminus of each mRNA due to the polythymidylate region. Alternatively, if only a pre-selected mRNA is to be amplified, then the primer or a mixture there off will be designed to be substantially complementary to a section of the chosen mRNA, typically at the 3xe2x80x2 terminus upstream of the poly(A) tail, i.e. n=0 and n1=0 and COM1 is designed specifically to anneal to a pre-defined part of the mRNA.
The anchor region (COM1) of the cDNA synthesis primer serves as specific hybridization area for one of the amplification primers (primer #2) in the subsequent amplification reaction of the cDNA.
Once the cDNA synthesis primer hybridizes to the mRNA, a first cDNA strand can be synthesized. This first strand of cDNA is preferably produced through the process of reverse transcription, wherein DNA is made from RNA, utilizing reverse transcriptase following standard techniques. This enzyme, present in all retroviruses (e.g., avian myeloblastoma virus), adds deoxyribonucleotides to the 3xe2x80x2 terminus of the primer. These reverse transcriptases might be M-MLV Reverse transcriptase, AMV Reverse Transcriptase or SUPERSCRIPT II Reverse Transcriptase (all available from Life Technologies) or displayTHERMO RT (Display Systems Biotech).
In order to purify the cDNA product of the first strand synthesis for the subsequent ligation step, a spin column is used. This can be any column which allows the separation of small oligonucleotides (less than 100 bp) from larger molecular cDNA species. This step is introduced in order to get rid of excess primers used for the first strand synthesis which would otherwise compete with the single stranded cDNA in the subsequent ligation step.
A specially designed adaptor fragment is ligated to the cDNA product of the first strand synthesis. The adaptor consists of two strands of the general formulas II and III respectively, both strands having complementary overlap regions (COM2 and COM4) of more than 4 residues, such as 5 to 30 residues, preferentially 6 to 25 residues. The first strand (general formula II) is modified with a phosphate group at the 5xe2x80x2-end and a so called extension protection group at the 3xe2x80x2-end. The second strand (general formula II) is modified with an extension protection group at the 3xe2x80x2-end. The specific region (COM4)is followed by 0 to 10 bp, such as 1 to 8 bp, preferably 1 to 4 bp of degenerate residues (N) and finally 0 to 25 bp, such as 1 to 20, such as 2 to 15, e.g. 3 to 14, e.g. 4 to 13, such as 5 to 12, preferentially 6 to 12 bp of non selective residues (Z). The non-selective residues are preferentially deoxy-inosine or 1-(2xe2x80x2-deoxy-beta-D-ribofuranosyl)-5-nitroindole.
xe2x80x835xe2x80x2-PO-COM2-COM3-X-3xe2x80x2xe2x80x83xe2x80x83II
5xe2x80x2-COM4-(N)n2-Zm-X-3xe2x80x2xe2x80x83xe2x80x83III
The phosphate group of strand II of the adaptor fragment serves the purpose of covalently linking this sequence to the 3xe2x80x2 end of the cDNA product of the first strand synthesis during the ligation reaction. The extension protection group can be any modification of the 3xe2x80x2 end of the two strands in the adaptor fragment which blocks the extension of a nucleic acid sequence by polymerases and avoids the ligation of the 3xe2x80x2 end to another nucleic acid sequence. There are two main purposes of the extension protection group. One is to avoid the occurrence of concatamer adaptor sequences during ligation and to limit the ligation events to only one, namely the joining of the 5xe2x80x2 end of strand (II) to the 3xe2x80x2 of the first strand cDNA synthesis. The other purpose of the extension group is to prevent the two strands (II) and (III) from participating in the downstream PCR reaction where they will still be present. As the two strands are present during the PCR reaction and potentially at high concentration they might also possibly anneal unspecifically to certain cDNA sequences. Thus, if they are devoid of their extension protection group, they might serve as primers and accordingly contribute to unspecific PCR events. Thus, for that reason, the extension protection group is important.
Preferentially the extension protection group is a dideoxynucleotide or a deoxynucleotide that is modified with an amine group at the 3xe2x80x2 position, such as aminopropan (3xe2x80x2 amine-C3) or aminohexan (3xe2x80x2amine-C6).
The non-selective residues of the adaptor fragment serves the function of providing a sufficient double stranded overlap between the adaptor and the single stranded cDNA to allow the ligation. As the T4 DNA ligase is specific for double stranded DNA this overlap shall be long enough in order for the T4 DNA ligase to work. It should be noted that the annealing temperature of the adaptor should be correlated with the temperature at which the ligation is performed. Thus, when the annealing temperature of the adaptor is between 15xc2x0 C. and 40xc2x0 preferentially T4 DNA ligase or E.coli DNA ligase should be used. If the adaptor is designed with a high annealing temperature (e.g. above 40xc2x0) it can be anticipated that a thermostable ligase like Thermus aquaticus (Taq) DNA ligase with advantage can be used. This ligase is active at elevated-temperatures (45xc2x0-65xc2x0). This Means that the ligation step of the method can be performed at temperatures from 16xc2x0 C. to 70xc2x0 C.
The overlap comprised by the degenerate residues and the non-selective residues should be at least 4 bp. The non-selective residues is preceded by a few (preferably 1-4) selective degenerate residues. These selective degenerate residues are included in order to avoid the adaptor from forming too strong self annealing dimers, inhibiting the ligation reaction.
However, the non-selective residues can not be replaced totally by selective residues, as that would result in a low equilibrium toward the specific single stranded cDNA. For instance a stretch of eight NNNNNNNN would mean that only 1/48=1/65536 part of the adaptor would anneal to a specific sequence. Furthermore a long stretch of N""s will result in adaptor mixtures that deviate largely in their annealing temperature. For instance, AAAAAAAA might deviate with up to 16xc2x0 C. compared to CCCCCCCC in their annealing temperature towards their specific sequence. Thus, a non selective base like deoxyinosine is incorporated into the adaptor in order to keep the annealing temperature of the adaptor constant and also to ensure a high equlibrium toward the specific single cDNA during the ligation.
Essentially any nucleic acid sequence, in purified or non-purified form, can be utilized as the starting nucleic acid(s) for the methods of the present invention, provided it comprises the desired specific nucleic acid sequence (i.e., complementary to the cDNA synthesis primer). It is only generally preferred that a sufficient number of bases at one end of the sequence be known in sufficient detail, so that a primer can be prepared which will hybridize to one of the strands of the desired sequence. A mixture of primers (including specific or degenerated sequences) may also be employed if the more than one nucleic acid sequence is the target.
It is also not necessary that the sequence to be amplified is initially present in a pure form; it may be a minor fraction of a complex mixture, or a portion of a nucleic acid sequence, the existence of which is due to the presence of a particular microorganism. Therefore, the amplification process is useful not only for producing large amounts of one specific nucleic acid sequence, but also for simultaneously amplifying more than one different specific nucleic acid sequence located on the same or different nucleic acid molecule.
The nucleic acid(s) may be obtained from any source, for example, from plasmids such as pBR322, from cloned DNA or RNA, or from natural DNA or RNA from any source, including bacteria, yeast, viruses, organelles, and higher organisms, such as plants or animals. DNA or RNA may be extracted from blood, serum, plasma, cerebrospinal fluid, tissue material/biopsies or cells by a variety of techniques such as those described by Sambrook et al., 1989, Molecular Cloning: A laboratory Manual.
As used herein, the term xe2x80x9ccDNA synthesis primerxe2x80x9d refers to an oligonucleotide having two components (general formula I): 1) a primer that may be synthetic or purified from a restriction digest of a nucleic acid and 2) an anchor region containing a specific sequence to be used as hybridization target for the amplification primers in the subsequent PCR amplification reaction. The primer component will be capable of acting as a point of initiation of synthesis, typically DNA polymerisation, when placed under conditions in which synthesis of a primer extension product that is complementary to a nucleic acid strand is induced, i.e., in the presence of appropriate nucleotides and a replicating agent (e.g., a DNA polymerase) under suitable conditions, which are described by Maniatis et al.
The primer is preferably a single stranded oligodeoxynucleotide. The primer must be sufficiently long to act as a template for the synthesis of extension products in the presence of the replicating agent. The exact lengths of the primers and the quantities used will depend on many factors, including temperature, degree of homology and other conditions. Preferably, the primer length is between 15 and 25 nucleotides long with an equal distribution of purines and pyrimidines aiming at reaching an annealing temperature between 40-70xc2x0 C.
For example, when amplifying a specific sequence, the oligonucleotide primer typically contains between about 10 and 50 nucleotides, preferably 15-25 nucleotides. For other applications like differential display (Liang and Pardee, 1992), the oligonucleotide primer is typically, but not necessarily, shorter, e.g., 7-15 nucleotides. Such short primer molecules generally require lower hybridization temperatures to form sufficiently stable hybrid complexes with the template.
The oligonucleotide primers may be prepared using any suitable method, such as, for example, the well known phosphotriester and phosphodiester methods, or automated embodiments thereof. One method for synthesizing oligonucleotides on a modified solid support is described in U.S. Pat. No. 4,458,066. It is also possible to use a primer which has been isolated from a biological source (such as a restriction endonuclease digest).
The primers herein are selected to be xe2x80x9csubstantiallyxe2x80x9d complementary to the different strands of each specific sequence to be amplified, i.e., the primers should be sufficiently complementary to hybridize with their respective strands at a annealing temperature from 40xc2x0 to 70xc2x0 C. Therefore, the primer sequence need not reflect the exact sequence of the template, and can, in fact, be xe2x80x9cdegenerate.xe2x80x9d Non-complementary bases or longer sequences can be interspersed into the primer, provided that the primer sequence has sufficient complementarity with the sequence of the strand to be amplified to permit hybridization and extension.
Generally, it is not necessary to know the sequence of a target mRNA, as the primer may be a poly(T) of sufficient length to hybridize with substantially all members of an entire population of mRNAs (i.e., poly(T)n, wherein n is typically from about 5 to 50 or more). Of course, when more sequence knowledge is available for a target RNA, the primer may be designed more specifically, which greatly increases the efficiency of the amplification. If a sequence specific primer is used in the first strand cDNA synthesis, the specific target RNA will preferentially be reverse transcribed. Thus, in the following PCR amplification using the sequence specific primer and a primer specific for the 5xe2x80x2-ligated anchor sequence, this sequence should preferentially be amplified.
Moreover, the primers may actually comprise a collection of primer sequences, such as where more than one target sequence exists. Also, if there is ambiguity in the sequence information, a number of primers should be prepared. In particular, when any of several possible nucleic acid sequences encoding a protein could be correct, based on a polypeptide sequence obtained from a fragment of the protein, a collection of primers containing sequences representing most or all of the possible codon variations (utilizing codon degeneracy) can be prepared.
The techniques of the present invention also provide a number of additional genetic manipulation technologies. The amplified double stranded cDNA represents a useful intermediate for construction of complex cDNA libraries from extremely limited amounts of tissue, such as individual brain nuclei, tissue sections, and even single cells.
Accordingly, the present invention is, in one aspect, directed to the use of the amplification process in a method for detecting expression of a gene in a pre-selected cell population comprising steps of:
(a) synthesizing double-stranded cDNA by treating RNAs from cell populations with a cDNA synthesis primer comprising an oligdnucleotide sequence complementary to one or more of the RNA sequences, followed by reverse transcription of the RNA and ligation of an adaptor to a single stranded cDNA molecule and subsequent PCR amplification as described in detail above,
(b) determining the presence or absence of cDNA complementary to the RNA corresponding to the gene.
The cell population may be, e.g., from a human tissue, such as blood, brain nuclei, liver, prostate, mammary, heart, kidney, lung, testis and pancreas. The tissue may be an embryonic or fetal tissue. The cell population may be single cell, or up to 100 to 1,000,000 cells or more, as desired.
Further, the amplified double stranded cDNA can be used as a source for producing large amounts of single-stranded, anti-sense material for use as driver in subtractive hybridization. For example, two nucleic acid populations, one sense, and one anti-sense, can be allowed to mix together with one population present in molar excess (driver). Sequences present in both populations will form hybrids, whereas sequences present in only one population remain single-stranded (Duguid et al., 1988).
Accordingly, the single stranded cDNA amplification technology can be applied to improve methods of detecting and isolating nucleic acid sequences that vary in abundance among different populations, such as in comparing mRNA expression among different tissues or within the same tissue, according to physiologic state. Methods for examining differential expression typically involve subtractive hybridization, wherein two nucleic acid populations, one sense and one anti-sense, are allowed to mix with one another. One population is present in molar excess (xe2x80x9cdriverxe2x80x9d) such that sequences represented in both populations form hybrids, whereas sequences present in only one population remain single-stranded. Thereafter, various well known techniques are used to separate the unhybridized molecules representing differentially expressed sequences.
Most methods of subtractive hybridization require that large amounts (generally between 10-100 micrograms) of nucleic acid are available for use as xe2x80x9cdriverxe2x80x9d in such experiments. This limits usefulness in examining differential expression of mRNAs present in a biological material that is available in small supply. This problem is overcome by cloning the nucleic acid populations of interest prior to subtraction, so that the cloning vector is used to,amplify the amount of nucleic acid available for hybridization. However, because subtraction requires previous cloning, it is complicated, suffers from under- and over- representation of sequences depending on differences in growth rates in the mixed population, and may risk recombination among sequences during propagation of the mixed population. The single stranded cDNA amplification technology of the present invention circumvents these problems by allowing production of large amounts of cDNA from limited amounts of nucleic acid, without the need for previous cloning.
Thus the present invention comprises a method for producing a subtractive hybridization probe comprising:
(a) synthesizing double-stranded cDNA by treating a first mRNA population with a cDNA synthesis primer comprising an oligonucleotide sequence complementary to one or more of the RNA sequences, followed by reverse transcription of the RNA and ligation of an adaptor to the single stranded cDNA molecule and subsequent PCR amplification as described in detail above, wherein primer #1 is modified by biotin in the 5xe2x80x2 end,
(b) Isolating the biotin-containing single stranded cDNA (sense) by use of streptavidin coated magnetic beads,
(c) synthesizing double-stranded cDNA by treating a second mRNA population with a cDNA synthesis primer comprising an oligonucleotide sequence complementary to one or more of the RNA sequences, followed by reverse transcription of the RNA and ligation of an adaptor to the single stranded cDNA molecule and subsequent PCR amplification as described in detail above, wherein primer #1 is modified by biotin in the 5xe2x80x2 end,
(d) Isolating the non-biotin-containing single stranded cDNA (anti-sense) by use of streptavidin coated magnetic beads,
(e) hybridizing the sense to the anti-sense cDNA whereby an unhybridized sub-population of the anti-sense cDNA is found,
(f) Isolating the unhybridized sub-population of the anti-sense cDNA by use of streptavidin coated cDNA,
(g) generating a second double-stranded cDNA collection from the unhybridized sub-population by PCR using primer #1 and primer #2.
Additionally, the present invention comprises methods for making cDNA libraries from a collection of mRNA molecules which has been reverse transcribed and amplified to double stranded cDNA by primer #1 and primer #2. The double stranded cDNA might be directly ligated into a vector like a TA-cloning vector (Invitrogen) which is transformed into E. coli. 
Said method comprising the steps of:
(a) synthesizing double-stranded cDNA by treating a plurality of mRNAs from the cell populations with a cDNA synthesis primer comprising an oligonucleotide sequence complementary to one or more of the RNA sequences, followed by reverse transcription of the RNA and ligation of an adaptor to a single stranded cDNA molecule and subsequent PCR amplification as described in detail above,
(b) producing a collection of double-stranded cDNAs by PCR by extending the primers of a plurality of any hybridization duplexes formed between the cDNA
(c) preparing a cDNA library from the amplified cDNAs
Another application of the technology is the detection of variant regions flanking a common sequence, such as for molecular diagnostics. By designing an amplification primer that recognizes a commonly shared sequence, single stranded cDNA is produced that contains not only the common region recognized by the primer, but also the 5xe2x80x2-anchor sequence which has been ligated to the single stranded cDNA. Thus PCR can be carried out even though only one region of shared sequence is known from the beginning. PCR generally requires that shared sequences to be known both 5xe2x80x2- and 3xe2x80x2- to the region of interest, and that these flanking regions are sufficiently close to allow efficient amplification. Thus, for example, cDNA can be produced from limited amounts of clinical material to allow pathogen-specific sequences (such as those of distinguishing viral types) to be identified, genetic polymorphisms to be detected, or alternate splicing variants to be characterized, all in accordance with standard techniques.
Although the paradigms of the present invention will provide a useful adjunct to PCR in a wide variety of diagnostic or other studies, especially facilitated are studies of gene expression in essentially any mammalian cell or cell population. Although the cell may be from blood (e.g., lymphocytes, such as T or B cells), a typical source of cell or tissue or DNA or nucleotides will be solid organs, such as brain, spleen, bone, heart, vascular, lung, kidney, liver, pituitary, endocrine glands, lymph node, dispersed primary cells, tumor cells, skin, hair, or the like. The cell, tissue etc. may be an embryonic or fetal tissue. Thus, in the neural research area, identification of mRNAs which vary as a function of arousal state, behavior, drug treatment, and development, for example, has been hindered by both the difficulty in construction of cDNA libraries from brain tissue and in the relative spatial insensitivity of subtractive hybridization techniques. Use of the single stranded cDNA amplification method in construction of cDNA libraries from individual brain nuclei will provide for greater representation of low-abundance mRNAs from these tissues compared with their representation in whole brain cDNA libraries, and facilitate cloning of important low-abundance messages.
The materials for use in the methods of the present invention are ideally suited for preparing of kits, produced in accordance with well known procedures, and are therefore readily provided in kit form for a variety of uses. Such a kit may comprise containers, each with one or more of the various reagents (typically in concentrated form) utilized in the methods, including, for example, buffers, the appropriate nucleotide triphosphates (e.g., dATP, dCTP, dGTP and dTTP; or rATP, rCTP, rGTP and UTP) which may be labeled, such as with fluorophores or radioactive labels (e.g. fluorescein, Cy3, Cy5, rhodamin and Texas Red; or 32P, 33P, 35S, 3H, 125I and 14C and the like), reverse transcriptase, DNA polymerase, T4 DNA ligase, the adaptor and one or more primer complexes of the present invention (e.g., appropriate length poly(T) or random primers linked to a promoter reactive with the RNA polymerase). A set of instructions will also typically be included.