This invention is directed to methods for simultaneous identification of differentially expressed mRNAs, as well as measurements of their relative concentrations.
An ultimate goal of biochemical research ought to be a complete characterization of the protein molecules-that make up an organism. This would include their identification, sequence determination, demonstration of their anatomical sites of expression, elucidation of their biochemical activities, and understanding of how these activities determine organismic physiology. For medical applications, the description should also include information about how the concentration of each protein changes in response to pharmaceutical or toxic agents.
Let us consider the scope of the problem: How many genes are there? The issue of how many genes are expressed in a mammal is still unsettled after at least two decades of study. There are few direct studies that address patterns of gene expression in different issues. Mutational load studies (J. O. Bishop, xe2x80x9cThe Gene Numbers Game,xe2x80x9d Cell 2:81-86 1974); T. Ohta and M. Kimura, xe2x80x9cFunctional Organization of Genetic Material as a Product Molecular Evolution,xe2x80x9d Nature 223:118-119 (1971)) have suggested that there are between 3xc3x97104 and 105 essential genes.
Before cDNA cloning techniques, information on gene expression came from RNA complexity studies: analog measurements (measurements in bulk) based on observations of mixed populations of RNA molecules with different specificities in abundances. To an unexpected extent, early analog complexity studies were distorted by hidden complications of the fact that the molecules in each tissue that make up most of its mRNA mass comprise only a small fraction of its total complexity. Later, cDNA cloning allowed digital measurements (i.e., sequence-specific measurements on individual species) to be made; hence, more recent concepts about mRNA expression are based upon actual observations of individual RNA species.
Brain, liver, and kidney are the mammalian tissues that have been most extensively studied by analog RNA complexity measurements. The lowest estimates of complexity are those of Hastie and Bishop (N. D. Hastie and J. B. Bishop, xe2x80x9cThe Expression of Three Abundance Classes of Messenger RNA in Mouse Tissues,xe2x80x9d Cell 9:761-774 (1976)), who suggested that 26xc3x97106 nucleotides of the 3xc3x97109 base pair rodent genome were expressed in brain, 23xc3x97106 in liver, and 22xc3x97106 in kidney, with nearly complete overlap in RNA sets. This indicates a very minimal number of tissue-specific mRNAs. However, experience has shown that these values must clearly be underestimates, because many mRNA molecules, which were probably of abundances below the detection limits of this early study, have been shown to be expressed in brain but detectable in neither liver nor kidney. Many other researchers (J. A. Bantle and W. E. Hahn, xe2x80x9cComplexity and Characterization of Polyadenylated RNA in the Mouse Brain,xe2x80x9d Cell 8:1139-150 (1976); D. M. Chikaraishi, xe2x80x9cComplexity of Cytoplasmic Polyadenylated and Non-Adenylated Rat Brain Ribonucleic Acids,xe2x80x9d Biochemistry 18:3249-3256 (1979)) have measured analog complexities of between 100-200xc3x97106 nucleotides in brain, and 2-to-3-fold lower estimates in liver and kidney. Of the brain mRNAs, 50-65% are detected in neither liver nor kidney. These values have been supported by digital cloning studies (R. J. Milner and J. G. Sutcliffe, xe2x80x9cGene Expression in Rat Brain,xe2x80x9d Nucl. Acids Res. 11:5497-5520 (1983)).
Analog measurements on bulk mRNA suggested that the average mRNA length was between 1400-1900 nucleotides. In a systematic digital analysis of brain mRNA length using 200 randomly selected brain cDNAs to measure RNA size by northern blotting (Milner and Sutcliffe, supra), it was found that, when the mRNA size data were weighted for RNA prevalence, the average length was 1790 nucleotides, the same as that determined by analog measurements. However, the mRNAs that made up most of the brain mRNA complexity had an average length of 5000 nucleotides. Not only were the rarer brain RNAs longer, but they tended to be brain specific, while the more prevalent brain mRNAs were more ubiquitously expressed and were much shorter on average.
These concepts about mRNA lengths have been corroborated more recently from the length of brain mRNA whose sequences have been determined (J. G. Sutcliffe, xe2x80x9cmRNA in the Mammalian Central Nervous System,xe2x80x9d Annu. Rev. Neurosci. 11:157-198 (1988)). Thus, the 1-2xc3x97108 nucleotide complexity and 5000-nucleotide average mRNA length calculates to an estimated 30,000 mRNAs expressed in the brain, of which about ⅔ are not detected in liver or kidney. Brain apparently accounts for a considerable portion of the tissue-specific genes of mammals. Most brain mRNAs are expressed at low concentration. There are no total-mammal mRNA complexity measurements, nor is it yet known whether 5000 nucleotides is a good mRNA-length estimate for non-neural tissues. A reasonable estimate of total gene number might be between 50,000 and 100,000.
What is most needed to advance by a chemical understanding of physiological function is a menu of protein sequences encoded by the genome plus the cell types in which each is expressed. At present, protein sequences can be reliably deduced only from cDNAs, not from genes, because of the presence of the intervening sequences (introns) in the genomic sequences. Even the complete nucleotide sequence of a mammalian genome will not substitute for characterization of its expressed sequences. Therefore, a systematic strategy for collecting transcribed sequences and demonstrating their sites of expression is needed. Such a strategy would be of particular use in determining sequences expressed differentially within the brain. It is necessarily an eventual goal of such a study to achieve closure; that is, to identify all mRNAs. Closure can be difficult to obtain due to the differing prevalence of various mRNAs and the large number of distinct mRNAs expressed by many distinct tissues. The effort to obtain it allows one to obtain a progressively more reliable description of the dimensions of gene space.
Studies carried out in the laboratory of Craig Venter (M. D. Adazns et al., xe2x80x9cComplementary DNA Sequencing: Expressed Sequence Tags and Human Genome Project,xe2x80x9d Science 252:1651-1656 (1991); M. D. Adams et al., xe2x80x9cSequence Identification of 2,375 Human Brain Genes,xe2x80x9d Nature 355:632-634 (1992)) have resulted in the isolation of randomly chosen cDNA clones of human brain mRNAs, the determination of short single-pass sequences of their 3xe2x80x2-ends, about 300 base pairs, and a compilation of some 2500 of these as a database of xe2x80x9cexpressed sequence tags.xe2x80x9d This database, while useful, fails to provide any knowledge of differential expression. It is therefore important to be able to recognize genes based on their overall pattern of expression within regions of brain and other tissues and in response to various paradigrns, such as various physiological or pathological states or the effects of drug treatment, rather than simply their expression in a single tissue.
Other work has focused on the use of the polymerase chain reaction (PCR) to establish a database. Williams et al. (J. G. K. Williams et al., xe2x80x9cDNA Polymorphisms Amplified by Arbitrary Primers Are Useful as Genetic Markers,xe2x80x9d Nucl. Acids Res. 18:6531-6535 (1990)) and Welsh and McClelland (J. Welsh and McClelland, xe2x80x9cGenomic Fingerprinting Using Arbitrarily Primed PCR and a Matrix of Pairwise Combinations of Primers,xe2x80x9d Nucl. Acids Res. 18:7213-7218 (1990)) showed that single 10-mer primers of arbitrarily chosen sequences, i.e., any 10-mer primer off the shelf, when used for PCR with complex DNA templates such as human, plant, yeast, or bacterial genomic DNA, gave rise to an array of PCR products. The priming events were demonstrated to involve incomplete complementarity between the primer and the template DNA. Presumably, partially mismatched primer-binding sites are randomly distributed through the genome. Occasionally, two of these sites in opposing orientation were located closely enough together to give rise to a PCR product band. There were on average 8-10 products, which varied in size from about 0.4 to about 4 kb and had different mobilities for each primer. The array of PCR products exhibited differences among individuals of the same species. These authors proposed that the single arbitrary primers could be used to produce restriction fragment length polymorphism (RFLP)-like information for genetic studies. Others have applied this technology (S. R. Woodward et al., xe2x80x9cRandom Sequence Oligonucleotide Primers Detect Polymorphic DNA Products Which Segregate in Inbred Strans of Mice,xe2x80x9d Mamm.Genome 3:73-78 (1992); J. H. Nadeau et al., xe2x80x9cMultilocus Markers for Mouse Genome Analysis: PCR Amplification Based on Single Primers of Arbitrary Nucleotide Sequence,xe2x80x9d Mamm. Genome 3:55-64 (1992)).
Two groups (J. Welsh et al., xe2x80x9cArbitrarily Primed PCR Fingerprinting of RNA,xe2x80x9d Nucl. Acids Res. 20:4965-4970 (1992); P. Liang and A. B. Pardee, xe2x80x9cDifferential Display of Eukaryotic Messenger RNA by Means of the Polymerase Chain Reaction,xe2x80x9d Science 257:967-971 (1992)) adapted the method to compare mRNA populations. In the study of Liang and Pardee, this method, called mRNA differential display, was used to compare the population of mRNAs expressed by two related cell types, normal and tumorigenic mouse A31 cells. For each experiment, they used one arbitrary 10-mer as the 5xe2x80x2-primer and an oligonucleotide complementary to a subset of poly A tails as a 3xe2x80x2 anchor primer, performing PCR amplification in the presence of 35S-dNTPs on cDNAs prepared from the two cell types. The products were resolved on sequencing gels and 50-100 bands ranging from 100-500 nucleotides were observed. The bands presumably resulted from amplification of cDNAs corresponding to the 3xe2x80x2-ends of mRNAs that contain the complement of the 3xe2x80x2 anchor primer and a partially mismatched 5xe2x80x2 primer site, as had been observed on genomic DNA templates. For each primer pair, the pattern of bands amplified from the two cDNAs was similar, with the intensities of about 80% of the bands being indistinguishable. Some of the bands were more intense in one or the other of the PCR samples; a few were detected in only one of the two samples.
Further studies (P. Liang et al., xe2x80x9cDistribution and Cloning of Eukaryotic mRNAs by Means of Differential Display: Refinements and Optimization,xe2x80x9d Nucl. Acids Res. 21:3269-3275 (1993)) have demonstrated that the procedure works with low concentrations of input RNA (although it is not quantitative for rarer species), and the specificity resides primarily in the last nucleotide of the 3xe2x80x2 anchor primer. At least a third of identified differentially detected PCR products correspond to differentially expressed RNAs, with a false positive rate of at least 25%.
If all of the 50,000 to 100,000 mRNAs of the mammal were accessible to this arbitrary-primer PCR approach, then about 80-95 5xe2x80x2 arbitrary primers and 12 3xe2x80x2 anchor primers would be required in about 1000 PCR panels and gels to give a likelihood, calculated by the Poisson distribution, that about two-thirds of these mRNAs would be identified.
It is unlikely that all mRNAs are amenable to detection by this method for the following reasons. For an mRNA to surface in such a survey, it must be prevalent enough to produce a signal on the autoradiograph and contain a sequence in its 3xe2x80x2 500 nucleotides capable of serving as a site for mismatched primer binding and priming. The more prevalent an individual mRNA species, the more likely it would be to generate a product. Thus, prevalent species may give bands with many different arbitrary primers. Because this latter property would contain an unpredictable element of chance based on selection of the arbitrary primers, it would be difficult to approach closure by the arbitrary primer method. Also, for the information to be portable from one laboratory to another and reliable, the mismatched priming must be highly reproducible under different laboratory conditions using different PCR machines, with the resulting slight variation in reaction conditions. As the basis for mismatched priming is poorly understood, this is a drawback of building a database from data obtained by the Liang and Pardee differential display method.
There is therefore a need for an improved method of differential display of mRNA species that reduces the uncertain aspect of 5xe2x80x2-end generation and allows data to be absolutely reproducible in different settings. Preferably, such a method does not depend on potentially irreproducible mismatched priming. Preferably, such a method reduces the number of PCR panels and gels required for a complete survey and allows double-strand sequence data to be rapidly accumulated. Preferably, such an improved method also reduces, if not eliminates, the number of concurrent signals obtained from the same species of mRNA.
We have developed an improved method for the simultaneous sequence-specific identification of mRNAs in a mRNA population. In general, this method comprises:
(1) preparing double-stranded cDNAs from a mRNA population using a mixture of 48 anchor primers, the anchor primers each including: (i) a tract of from 7 to 40 T residues; (ii) a site for cleavage by a first restriction endonuclease that recognizes more than six bases, the site for cleavage being located to the 5xe2x80x2-side of the tract of T residues; (iii) a first stuffer segment of from 4 to 40 nucleotides, the first stuffer segment being located to the 5xe2x80x2-side of the site for cleavage by the first restriction endonuclease; and (iv) phasing residues -V-N-N located at the 3xe2x80x2 end of each of the anchor primers, wherein V is a deoxyribonucleotide selected from the group consisting of A, C, and G; and N is a deoxyribonucleotide selected from the group consisting of A, C, G, and T, the mixture including anchor primers containing all possibilities for V and N;
(2) producing cloned inserts from a suitable host cell that has been transformed by a vector, the vector having the cDNA sample that has been cleaved with a first restriction endonuclease and a second restriction endonuclease inserted therein, the cleaved cDNA sample being inserted in the vector in an orientation that is antisense with respect to a bacteriophage-specific promoter within the vector, the second restriction endonuclease recognizing a four-nucleotide sequence and the first restriction endonuclease cleaving at a single site within each member of the mixture of anchor primers;
(3) generating linearized fragments of the cloned inserts by digestion with at least one restriction endonuclease that is different from the first and second restriction endonucleases;
(4) generating a cRNA preparation of antisense cRNA transcripts by incubation of the linearized fragments with a bacteriophage-specific RNA polymerase capable of initiating transcription from the bacteriophage-specific promoter;
(5) dividing the cRNA preparation into sixteen subpools and transcribing first-strand cDNA from each subpool, using a thermostable reverse transcriptase and one of sixteen 5xe2x80x2-RT primers whose 3xe2x80x2-terminus is -N-N, wherein N is one of the four deoxyribonucleotides A, C, G, or T, the 5xe2x80x2-RT primer being at least 15 nucleotides in length, corresponding in sequence to the 3xe2x80x2-end of the bacteriophage-specific promoter, and extending across into at least the first two nucleotides of the cRNA, the mixture including all possibilities for the 3xe2x80x2-terminal two nucleotides;
(6) using the product of transcription in each of the sixteen subpools as a template for a polymerase chain reaction with a 3xe2x80x2-PCR primer that corresponds in sequence to a sequence in the vector adjoining the site of insertion of the cDNA sample in the vector and a 5xe2x80x2-PCR primer selected from the group consisting of: (i) the 5xe2x80x2-RT primer from which first-strand cDNA was made for that subpool; (ii) the 5xe2x80x2-RT primer from which the first-strand cDNA was made for that subpool extended at its 3xe2x80x2-terminus by an additional residue -N, where N can be any of A, C, G, or T; and (iii) the 5xe2x80x2-RT primer used for the synthesis of first-strand cDNA for that subpool extended at its 3xe2x80x2-terminus by two additional residues -N-N, wherein N can be any of A, C, G. or T, to produce polymerase chain reaction amplified fragments; and
(7), resolving the polymerase chain reaction amplified fragments by electrophoresis to display bands representing the 3xe2x80x2-ends of mRNAs present in the sample.
In another preferred embodiment, the method comprises the steps of:
(a) preparing a double-stranded cDNA population from an mRNA population using a mixture of anchor primers, the anchor primers each including: (i) a tract of from 7 to 40 T residues; (ii) a site for cleavage by a first restriction endonuclease that recognizes more than six bases, the site for cleavage being located to the 5xe2x80x2-side of the tract of T residues; (iii) a first stuffer segment of from 4 to 40 nucleotides, the first stuffer segment being located to the 5xe2x80x2-side of the site for cleavage by the first restriction endonuclease; and (iv) phasing residues located at the 3xe2x80x2 end of each of the anchor primers -V-N-N , wherein V is a deoxyribonucleotide selected from the group consisting of A, C, and G; and N is a deoxyribonucleotide selected from the group consisting of A, C, G, and T, the mixture including anchor primers containing all possibilities for V and N;
(b) cleaving the double-stranded cDNA population with the first restriction endonuclease and with a second restriction endonuclease, the second restriction endonuclease recognizing a four-nucleotide sequence, to form a population of double-stranded cDNA molecules having first and second termini, respectively;
(c) inserting the double-stranded cDNA molecules from step (b) each into a vector in an orientation that is antisense with respect to a bacteriophage-specific promoter within the vector to form a population of vectors containing the inserted cDNA molecules, said inserting defining 3xe2x80x2 and 5xe2x80x2 flanking vector sequences such that 5xe2x80x2 is upstream from the sense strand of the inserted cDNA and 3xe2x80x2 is downstream of the sense strand, and said vector having a 3xe2x80x2 flanking nucleotide sequence of from at least 15 nucleotides in length between said first restriction endonuclease site and a site defining transcription initiation in said promoter;
(d) generating linearized fragments containing the inserted cDNA molecules by digestion of the vectors produced in step (c) with at least one restriction endonuclease that does not recognize sequences in the inserted cDNA molecules or in the bacteriophage-specific promoter, but does recognize sequences in the vector such that the resulting linearized fragments have a 5xe2x80x2 flanking vector sequence of at least 15 nucleotides 5xe2x80x2 to the site of insertion of the cDNA sample into the vector at the cDNA""s second terminus;
(e) generating a cRNA preparation of antisense cRNA transcripts by incubation of the linearized fragments with a bacteriophage-specific RNA polymerase capable of initiating transcription from the bacteriophage-specific promoter;
(f) dividing the cRNA preparation into subpools and transcribing first-strand cDNA from each subpool, using a reverse transcriptase and one of the 5xe2x80x2-RT primers defined as having a 3xe2x80x2-terminus consisting of -Nx, wherein xe2x80x9cNxe2x80x9d is one of the four deoxyribonucleotides A, C, G, or T, and xe2x80x9cxxe2x80x9d is an integer from 1 to 5, the 5xe2x80x2-RT primer being 15 to 30 nucleotides in length and complementary to the 5xe2x80x2 flanking vector sequence with the 5xe2x80x2-RT primer""s complementarity extending across into the insert-specific nucleotides of the cRNA in a number of nucleotides equal to xe2x80x9cxxe2x80x9d, wherein a different one of said 5xe2x80x2-RT primers is used in different subpools and wherein there are 4 subpools if xe2x80x9cxxe2x80x9d=1, 16 subpools if xe2x80x9cxxe2x80x9d=2, 64 subpools if xe2x80x9cxxe2x80x9d=3, 256 subpools if xe2x80x9cxxe2x80x9d=4, and 1,024 subpools if xe2x80x9cxxe2x80x9d=5;
(g) using the product of first-strand cDNA transcription in each of the subpools as a template for a polymerase chain reaction with a 3xe2x80x2-PCR primer of 15 to 30 nucleotides in length that is complementary to 3xe2x80x2 flanking vector sequences between said first restriction endonuclease site and the site defining transcription initiation by the bacteriophage-specific promoter and a 5xe2x80x2- PCR primer having a 3xe2x80x2-terminus consisting of -Nx-Ny, where xe2x80x9cNxe2x80x9d and xe2x80x9cxxe2x80x9d are as in step (f), -Nx is the same sequence as in the 5xe2x80x2-RT primer from which first-strand cDNA was made for that subpool, and xe2x80x9cyxe2x80x9d is a whole integer such that x+y equals an integer selected from the group consisting of 3, 4, 5 and 6, the 5xe2x80x2-PCR primer being 15 to 30 nucleotides in length and complementary to the 5xe2x80x2 flanking vector sequence with the 5xe2x80x2-PCR primer""s complementarity extending across into the insert-specific nucleotides of the cRNA in a number of nucleotides equal to xe2x80x9cx+yxe2x80x9d, to produce polymerase chain reaction amplified fragments; and
(h) resolving the polymerase chain reaction amplified fragments to generate a display of sequence-specific products representing the 3xe2x80x2-ends of different mRNAs present in the mRNA population.
Typically, the anchor primers each have 18 T residues in the tract of T residues, and the first stuffer segment of the anchor primers is 14 residues in length. A suitable sequence for the first stuffer segment is A-A-C-T-G-G-A-A-G-A-A-T-T-C (SEQ ID NO: 1). Typically, the site for cleavage by a first restriction endonuclease that recognizes more than six bases is the NotI cleavage site. Suitable anchor primers can also comprise a second stuffer segment interposed between the site for cleavage by a first restriction endonuclease that recognizes more than six bases and the tract of T residues. Phasing residues that are at the 3xe2x80x2 end of the anchor primer and 3xe2x80x2 to the tract of T residues have the sequence -V-N-N, where V is a deoxyribonucleotide selected from the group consisting of A, C, and G; and N is a deoxyribonucleotide selected from the group consisting of A, C, G, and T.
In one preferred embodiment, the anchor primer has the sequence A-A-C-T-G-G- A-A-G-A-A-T-T-C-G-C-G-G-C-C-G-C-A-G-G-A-A-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-V-N-N (SEQ ID NO: 28), including a first stuffer segment of A-A-C-T-G-G-A-A-G-A-A-T-T-C that is 5xe2x80x2 to the NotI site G-C-G-G-C-C-G-C, a second stuffer sequence A-G-G-A-A interposed between the restriction endonuclease cleavage site and the tract of T residues, and phasing residues -V-N-N.
Typically the first restriction endonuclease that recognizes more than six bases is selected from the group consisting of AscI, BaeI, FseI, NotI, PacI, PmeI, PpuMI, RsrII, SapI, SexAI, SfiI, SgfI, SgrAI, SrfI, Sse8387I and SwaI. Typically the second restriction endonuclease recognizing a four-nucleotide sequence is selected from the group consisting of MboI, DpnII, Sau3AI, Tsp509I, HpaII, BfaI, Csp6I, MseI, HhaI, NIalII, TaqI, MspI, MaeII and HinP1I.
Typically-the value of xe2x80x9cxxe2x80x9d in step (D is 1 or 2. Typically the value of xe2x80x9cyxe2x80x9d in step (g) is 3 or 4. In a preferred embodiment, the phasing residues in step (a) are -V-N-N, the xe2x80x9cxxe2x80x9d in step (f) is 1, and the xe2x80x9cyxe2x80x9d in step (g) is 3. In another preferred embodiment, the phasing residues in step (a) are -V-N-N, the xe2x80x9cxxe2x80x9d in step (f) is 1, and the xe2x80x9cyxe2x80x9d in step (g) is 4.
Typically, the anchor primers each have 18 T residues in the tract of T residues, and the first stuffer segment of the anchor primers is 14 residues in length
Suitable vectors are pBC SK+ and pBS SK+ (Stratagene). In another aspect, the invention provides improved vectors based on pBS SK+ that are designed for the practice of the invention such as pBS SK+/DGT1, pBS SK+/DGT2 and pBS SKn+/DGT3, described in detail below. Such improved vectors can also be based on pBC SK+ or other suitable vectors well known to one skilled in the art.
Preferred vectors are improved vectors based on the plasmid vector pBluescript (pBS or pBC) SK+ (Stratagene) in which a portion of the nucleotide sequence from positions 656 to 764 was removed and replaced with a sequence of at least 110 nucleotides including a NotI restriction endonuclease site. This region, designated the multiple cloning site (MCS), spans the portion of the nucleotide sequence from the SagI site to the KpnI site.
The vector can be the plasmid pBC SK+ cleaved with ClaI and NotI, in which case the 3xe2x80x2-PCR primer in step (6) can be G-A-A-C-A-A-A-A-G-C-T-G-G-A-G-C-T-C-C-A-C-C-G-C (SEQ ID NO: 4). In a preferred embodiment, the vector is chosen from the group consisting of pBC SK+, pBS SK+ and pBS SK+/DGT1 and the 3xe2x80x2-PCR primer in step (f) is G-A-G-C-T-C-C-A-C-C-G-C-G-G-T (SEQ ID NO: 18).
Typically the restriction endonuclease used in step (d) has a nucleotide sequence recognition that includes the four-nucleotide sequence of the second restriction endonuclease used in step (b). In general, the sites for such restriction endonucleases must be in the vector sequence 5xe2x80x2 to the ClaI site as well as in the MCS between the ClaI site and the NotI site.
In one embodiment, vector is the plasmid pBC SK+ and MspI is used both as the second restriction endonuclease and as the linearization restriction endonuclease used in step (d).
In another embodiment, vector is the plasmid pBC SK+, the second restriction endonuclease is chosen from the group consisting of MspI, MaeII, TaqI and HinP1I and the linearization in step (d) is accomplished by a first digestion with Smal followed by a second digestion with a mixture of KpnI and ApaI.
In other embodiments the vector is chosen from the group consisting of pBS SK+/DGT1, pBS SK+/DGT2 and pBS SK+/DGT3. In such embodiments, one suitable enzyme combination is provided where the second restriction endonuclease is MspI and the restriction endonuclease used in step (d) is Sma I. Another suitable combination is provided where the second restriction endonuclease is TaqI and the restriction endonuclease used in step (d) is XhoI. A further suitable combination is provided where the second restriction endonuclease is HinP1I and the restriction endonuclease used in step (d) is NarI. Yet another suitable combination is provided where the second restriction endonuclease is MaeII and the restriction endonuclease used in step (d) is AatII.
Typically the bacteriophage-specific promoter is selected from the group consisting of T3 promoter, T7 promoter and SP6 promoter. Most typically it is the T3 promoter.
Typically, the sixteen 5xe2x80x2-RT primers for priming of transcription of cDNA from cRNA have the sequence A-G-G-T-C-G-A-C-G-G-T-A-T-C-G-G-N-N (SEQ ID NO: 3). In another preferred embodiment, the four 5xe2x80x2-RT primers for priming of transcription of cDNA from cRNA have the sequence G-G-T-C-G-A-C-G-G-T-A-T-C-G-G-N (SEQ ID NO: 9).
The second restriction endonuclease recognizing a four-nucleotide sequence is typically MspI; alternatively, it can be TaqI, MaeII or HinP1I. The restriction endonuclease cleaving at a single site in each of the mixture of anchor primers is typically NotI.
Typically, the mRNA population has been enriched for polyadenylated mRNA species.
A typical host cell is a strain of Escherichia coli. 
The step of generating linearized fragments of the cloned inserts typically comprises:
(a) dividing the plasmid containing the insert into two fractions, a first fraction cleaved with the restriction endonuclease XhoI and a second fraction cleaved with the restriction endonuclease SalI;
(b) recombining the first and second fractions after cleavage;
(c) dividing the recombined fractions into thirds and cleaving the first third with the restriction endonuclease HindIII, the second third with the restriction endonuclease BamHI, and the third third with the restriction endonuclease EcoRI; and
(d) recombining the thirds after digestion in order to produce a population of linearized fragments of which about one-sixth of the population corresponds to the product of cleavage by each of the possible combinations of enzymes.
In another embodiment, wherein the vector is chosen from the group consisting of pBC SK+ and pBS SK+, and MspI is the second restriction endonuclease, MspI can be used as the linearization restriction endonuclease used in step (d). Alternatively, where the vector is the plasmid pBC SK+, linearization can be accomplished by a first digestion with SmaI followed by a second digestion with a mixture of KpnI and ApaI.
In other embodiments, the vector is chosen from the group consisting of pBS SK+/DGT1, pBS SK+/DGT2 and pBS SK+/DGT3 and the linearization restriction endonuclease used in step (d) is chosen from the group consisting of SmaI, XhoI, NarI and AatII.
Typically, the step of resolving the polymerase chain reaction amplified fragments by electrophoresis comprises electrophoresis of the fragments on at least two gels.
Each sequence-specific PCR product, or polymerase chain reaction amplified fragment, is identified by a digital address consisting of a sequence identifier, the length of the product in nucleotide residues and the intensity of labeling of the PCR product, defined as the area under the peak of the detector output for that PCR product.
The sequence identifier is defined by a 5xe2x80x2 component and a 3xe2x80x2 component. The 5xe2x80x2 component of the sequence identifier is the recognition site of the second restriction nuclease used to cleave the double stranded cDNA population prepared from the original mRNA population. Typically, the restriction endonuclease is MspI, and the 5xe2x80x2 component of the sequence identifier is -C-C-G-G. The 3xe2x80x2 component of the sequence identifier is the sequence defined by the 3xe2x80x2 terminus sequence of the 5xe2x80x2PCR primer. For example, the 3xe2x80x2 component of the sequence identifier of the PCR product indicated as xe2x80x9c111xe2x80x9d in FIG. 2 is -C-T-G-C. Therefore, in this case, the sequence identifier would be -C-C-G-GC-T-G-C.
Typically, a database comprising the digital address, as defined above as sequence identifier and the length of the sequence-specific PCR product in nucleotide residues and the intensity of labeling of the PCR product, defined as the area under the peak of the detector output for that PCR product, is constructed and maintained using suitable computer hardware and computer software. Preferably, such a database further comprises data concerning sequence relationships, gene mapping, cellular distributions, experimental treatment conditions and any other information considered relevant to gene function.
The method can further comprise determining the sequence of the 3xe2x80x2-end of at least one of the mRNAs, such as by:
(1) eluting at least one cDNA corresponding to a mRNA from an electropherogram in which bands representing the 3xe2x80x2-ends of mRNAs present in the sample are displayed;
(2) amplifying the eluted cDNA in a polymerase chain reaction;
(3) cloning the amplified cDNA into a plasmid;
(4) producing DNA corresponding to the cloned DNA from the plasmid; and
(5) sequencing the cloned cDNA.
Another aspect of the invention is a method of simultaneous sequence-specific identification of mRNAs corresponding to members of an antisense cRNA pool representing the 3xe2x80x2-ends of a population of mRNAs, the antisense cRNAs that are members of the antisense cRNA pool being terminated at their 5xe2x80x2-end with a primer sequence corresponding to a bacteriophage-specific vector and at their 3xe2x80x2-end with a sequence corresponding in sequence to a sequence of the vector.
The method comprises;
(1), dividing the members of the antisense cRNA pool into sixteen subpools and transcribing first-strand cDNA from each subpool, using a thermostable reverse transcriptase and one of sixteen 5xe2x80x2-RT primers whose 3xe2x80x2-terminus is -N-N, wherein N is one of the four deoxyribonucleotides A, C, G, or T, the 5xe2x80x2-RT primer being at least 15 nucleotides in length, corresponding in sequence to the 3xe2x80x2-end of the bacteriophage-specific promoter, and extending across into at least the first two nucleotides of the cRNA, the mixture including all possibilities for the 3xe2x80x2-terminal two nucleotides;
(2) using the product of trascription in each of the sixteen subpools as a template for a polymerase chain reaction with a 3xe2x80x2-PCR primer that corresponds in sequence to a sequence vector adjoining the site of insertion of the cDNA sample in the vector and a 5xe2x80x2-PCR primer selected from the group consisting of: (i) the 5xe2x80x2-RT primer from which first-strand cDNA was made for that subpool; (ii) the 5xe2x80x2-RT primer from which the first-strand cDNA was made for that subpool extended at its 3xe2x80x2-terminus by an additional residue -N, where N can be any of A, C, G, or T; and (iii) the 5xe2x80x2-RT primer used for the synthesis of first-strand cDNA for that subpool extended at its 3xe2x80x2-terminus by two additional residues -N-N, wherein N can be any of A, C, G, or T, to produce polymerase chain reaction amplified fragments; and
(3) resolving the polymerase chain reaction amplified fragments by electrophoresis to display bands representing the 3xe2x80x2-ends of mRNAs present in the sample.
In another preferred embodiment, the method comprises:
(1) dividing the cRNA preparation into subpools and transcribing first-strand cDNA from each subpool, using a reverse transcriptase and one of the 5xe2x80x2-RT primers defined as having a 3xe2x80x2-terminus consisting of -Nx, wherein xe2x80x9cNxe2x80x9d is one of the four deoxyribonucleotides A, C, G, or T, and xe2x80x9cxxe2x80x9d is an integer from 1 to 5, the 5xe2x80x2-RT primer being 15 to 30 nucleotides in length and complementary to the 5xe2x80x2 flanking vector sequence with the 5xe2x80x2-RT primer""s complementarity extending across into the insert-specific nucleotides of the cRNA in a number of nucleotides equal to xe2x80x9cxxe2x80x9d, wherein a different one of said 5xe2x80x2-RT primers is used in different subpools and wherein there are 4 subpools if xe2x80x9cxxe2x80x9d=1, 16 subpools if xe2x80x9cxxe2x80x9d=2, 64 subpools if xe2x80x9cxxe2x80x9d=3, 256 subpools if xe2x80x9cxxe2x80x9d=4, and 1,024 subpools if xe2x80x9cxxe2x80x9d=5;
(2) using the product of first-strand cDNA transcription in each of the subpools as a template for a polymerase chain reaction with a 3xe2x80x2-PCR primer of 15 to 30 nucleotides in length that is complementary to 3xe2x80x2 flanking vector sequences between said first restriction endonuclease site and the site defining transcription initiation by the bacteriophage-specific promoter and a 5xe2x80x2-PCR primer having a 3xe2x80x2-terminus consisting of -Nx,-Ny, where xe2x80x9cNxe2x80x9d and xe2x80x9cxxe2x80x9d are as in step (f), -Nx is the same sequence as in the 5xe2x80x2-RT primer from which first-strand cDNA was made for that subpool, and xe2x80x9cyxe2x80x9d is a whole integer such that x+y equals an integer selected from the group consisting of 3, 4, 5 and 6, the 5xe2x80x2-PCR primer being 15 to 30 nucleotides in length and complementary to the 5xe2x80x2 flanking vector sequence with the primer""s complementarity extending across into the insert-specific nucleotides of the cRNA in a number of nucleotides equal to xe2x80x9cx+yxe2x80x9d, to produce polymerase chain reaction amplified fragments; and
(3) resolving the polymerase chain reaction amplified fragments to generate a display of sequence-specific products representing the 3xe2x80x2-ends of different mRNAs present in the mRNA population.
Yet another aspect of the present invention is a method for detecting a change in the pattern of mRNA expression in a tissue associated with a physiological or pathological change. This method comprises the steps of:
(1) obtaining a first sample of a tissue that is not subject to the physiological or pathological change;
(2) determining the pattern of mRNA expression in the first sample of the tissue by performing steps (1)-(3) of the method described above for simultaneous sequence-specific identification of mRNAs corresponding to members of an antisense cRNA pool representing the 3xe2x80x2-ends of a population of mRNAs to generate a first display of bands representing the 2xe2x80x2-ends of mRNAs present in the first sample;
(3) obtaining a second sample of the tissue that has been subject to the physiological or pathological change;
(4) determining the pattern of mRNA expression in the second sample of the tissue by performing steps (1)-(3) of the method described above for simultaneous sequence-specific identification of mRNAs corresponding to members of an antisense cRNA pool to generate a second display of bands representing the 3xe2x80x2-ends of mRNAs present in the second sample; and
(5) comparing the first and second displays to determine the effect of the physiological or pathological change on the pattern of mRNA expression in the tissue.
The comparison is typically made in adjacent lanes.
The tissue can be derived from the central nervous system or from particular structures within the central nervous system. The tissue can alternatively be derived from another organ or organ system.
Another aspect of the present invention is a method of screening for a side effect of a drug. The method can comprise the steps of:
(1) obtaining a first sample of tissue from an organism treated with a compound of known physiological function;
(2) determining the pattern of mRNA expression in the first sample of the tissue by performing steps (1)-(3) of the method described above for simultaneous sequence-specific identification of mRNAs corresponding to members of an antisense cRNA pool to generate a first display of bands representing the 3xe2x80x2-ends of mRNAs present in the first sample;
(3) obtaining a second sample of tissue from an organism treated with a drug to be screened for a side effect;
(4) determining the pattern of mRNA expression in the second sample of the tissue by performing steps (1)-(3) of the method described above for simultaneous sequence-specific identification of mRNAs corresponding to members of an antisense cRNA pool to generate a second display of bands representing the 3xe2x80x2-ends of mRNAs present in the second sample; and
(5) comparing the first and second displays in order to detect the presence of mRNA species whose expression is not affected by the known compound but is affected by the drug to be screened, thereby indicating a difference in action of the drug to be screened and the known compound and thus a side effect.
The drug to be screened can be a drug affecting the central nervous system, such as an antidepressant, a neuroleptic, a tranquilizer, an anticonvulsant, a monoamine oxidase inhibitor, or a stimulant. Alternatively, the drug can be another class of drug such as an anti-parkinsonism agent, a skeletal muscle relaxant, an analgesic, a local anesthetic, a cholinergic, an antispasmodic, a steroid, or a non-steroidal anti-inflammatory drug.
Another aspect of the present invention is panels of primers and degenerate mixtures of primers suitable for the practice of the present invention. These include:
(1) a panel of primers comprising 16 5xe2x80x2-RT primers of the sequence A-G-G-T-C-G-A-C-G-G-T-A-T-C-G-G-N-N (SEQ ID NO: 3), wherein N is one of the four deoxyribonucleotides A, C, G, or T;
(2) a panel of primers comprising 64 5xe2x80x2-PCR primers of the sequences A-G-G-T-C-G-A-C-G-G-T-A-T-C-G-G-N-N-N (SEQ ID NO: 5), wherein N is one of the four deoxyribonucleotides A, C, G, or T;
(3) a panel of primers comprising 256 5xe2x80x2-PCR primers of the sequences A-G-G-T-C-G-A-C-G-G-T-A-T-C-G-G-N-N-N-N (SEQ ID NO: 6), wherein N is one of the four deoxyribonucleotides A, C, G, or T; and
(4) a panel of primers comprising 1024 5xe2x80x2-PCR primers of the sequences A-G-G-T-C-G-A-C-G-G-T-A-T-C-G-G-N-N-N-N-N (SEQ ID NO: 24), wherein N is one of the four deoxyribonucleotides A, C, G, or T; and
(5) a panel of primers comprising 4096 5xe2x80x2-PCR primers of the sequences A-G-G-T-C-G-A-C-G-G-T-A-T-C-G-G-N-N-N-N-N-N (SEQ ID NO: 25), wherein N is one of the four deoxyribonucleotides A, C, G, or T;
(6) a panel of primers comprising 48 anchor primers of the sequences A-A-C-T-G-G-A-A-G-A-A-T-T-C -G-C -G-G-C-C -G-C-A-G-G-A-A-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-V-N-N (SEQ ID NO: 28), wherein V is a deoxyribonucleotide selected from the group consisting of A, C, and G; and N is a deoxyribonucleotide selected from the group consisting of A, C, G, and T;
(7) a panel of 5xe2x80x2-RT primers comprising 4 different oligonucleotides each having the sequence G-G-T-C-G-A-C-G-G-T-A-T-C-G-G-N (SEQ ID NO: 9), wherein N is one of the four deoxyribonucleotides A, C, G, or T;
(8) a panel of 5xe2x80x2-RT primers comprising 16 different oligonucleotides each having the sequence G-T-C-G-A-C-G-G-T-A-T-C-G-G-N-N (SEQ ID NO: 12), wherein N is one of the four deoxyribonucleotides A, C, G, or T;
(9) a panel of 5xe2x80x2-PCR primers comprising 64 different oligonucleotides each having the sequence T-C-G-A-C-G-G-T-A-T-C-G-G-N-N-N (SEQ ID NO: 13), wherein N is one of the four deoxyribonucleotides A, C, G, or T;
(10) a panel of 5xe2x80x2-PCR primers comprising 256 different oligonucleotides each having the sequence C-G-A-C-G-G-T-A-T-C-G-G-N-N-N-N (SEQ ID NO: 14), wherein N is one of the four deoxyribonucleotides A, C, G, or T;
(11) a panel of 5xe2x80x2-PCR primers comprising 1024 different oligonucleotides each having the sequence G-A-C-G-G-T-A-T-C-G-G-N-N-N-N-N (SEQ ID NO: 15), wherein N is one of the four deoxyribonucleotides A, C, G, or T;
(12) a panel of 5xe2x80x2-PCR primers comprising 4096 different oligonucleotides each having the sequence A-C-G-G-T-A-T-C-G-G-N-N-N-N-N-N (SEQ ID NO: 16), wherein N is one of the four deoxyribonucleotides A, C, G, or T;
(13) a panel of 5xe2x80x2-RT primers comprising 4 different oligonucleotides each having the sequence C-T-T-C-A-G-T-C-A-G-G-C-T-A-A-T-C-G-G-N (SEQ ID NO: 10), wherein N is one of the four deoxyribonucleotides A, C, G, or T;
(14) a panel of 5xe2x80x2-RT primers comprising 16 different oligonucleotides each having the sequence T-T-C-A-G-T-C-A-G-G-C-T-A-A-T-C-G-G-N-N (SEQ ID NO: 11), wherein N is one of the four deoxyribonucleotides A, C, G, or T;
(15) a panel of 5xe2x80x2-PCR primers comprising 64 different oligonucleotides each having the sequence T-C-A-G-T-C-A-G-G-C-T-A-A-T-C-G-G-N-N-N (SEQ ID NO: 12), wherein N is one of the four deoxyribonucleotides A, C, G, or T;
(16) a panel of 5xe2x80x2-PCR primers comprising 256 different oligonucleotides each having the sequence C-A-G-T-C-A-G-G-C-T-A-A-T-C-G-G-N-N-N-N (SEQ ID NO: 17), wherein N is one of the four deoxyribonucleotides A, C, G, or T;
(17) a panel of 5xe2x80x2-PCR primers comprising 1024 different oligonucleotides each having the sequence A-G-T-C-A-G-G-C-T-A-A-T-C-G-G-N-N-N-N-N (SEQ ID NO: 26), wherein N is one of the four deoxyribonucleotides A, C, G, or T;
(18) a panel of 5xe2x80x2-PCR primers comprising 4096 different oligonucleotides each having the sequence G-T-C-A-G-G-C-T-A-A-T-C-G-G-N-N-N-N-N-N (SEQ ID NO: 27), wherein N is one of the four deoxyribonucleotides A, C, G, or T; and
(19) a degenerate mixture of anchor primers comprising a mixture of 48 primers of the sequences A-A-C-T-G-G-A-A-G-A-A-T-T-C-G-C-G-G-C-C-G-C-A-G-G-A-A-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-V-N-N (SEQ ID NO: 28), wherein V is a deoxyribonucleotide selected from the group consisting of A, C, and G; and N is a deoxyribonucleotide selected from the group consisting of A, C, G, and T, each of the 48 primers being present in about an equimolar quantity.