This invention relates to methods of detecting and cloning of individual mRNAs.
The activities of genes in cells are reflected in the kinds and quantities of their mRNA and protein species. Gene expression is crucial for processes such as aging, development, differentiation, metabolite production, progression of the cell cycle, and infectious or genetic or other disease states. Identification of the expressed mRNAs will be valuable for the elucidation of their molecular mechanisms, and for applications to the above processes.
Mammalian cells contain approximately 15,000 different mRNA sequences, however, each mRNA sequence is present at a different frequency within the cell. Generally, mRNAs are expressed at one of three levels. A few "abundant" mRNAs are present at about 10,000 copies per cell, about 3,000-4,000 "intermediate" mRNAs are present at 300-500 copies per cell, and about 11,000 "low-abundance" or "rare" mRNAs are present at approximately 15 copies per cell. The numerous genes that are represented by intermediate and low frequencies of their mRNAs can be cloned by a variety of well established techniques (see for example Sambrook et al., 1989, Molecular Cloning: A Laboratory Manual, Second Edition, Cold Spring Harbor Press, pp. 8.6-8.35).
If some knowledge of the gene sequence or protein is had, several direct cloning methods are available. However, if the identity of the desired gene is unknown one must be able to select or enrich for the desired gene product in order to identify the "unknown" gene without expending large amounts of time and resources.
The identification of unknown genes can often involve the use of subtractive or differential hybridization techniques. Subtractive hybridization techniques rely upon the use of very closely related cell populations, such that differences in gene expression will primarily represent the gene(s) of interest. A key element of the subtractive hybridization technique is the construction of a comprehensive complementary DNA ("cDNA") library.
The construction of a comprehensive cDNA library is now a fairly routine procedure. PolyA mRNA is prepared from the desired cells and the first strand of the cDNA is synthesized using RNA-dependent DNA polymerase ("reverse transcriptase") and an oligodeoxynucleotide primer of 12 to 18 thymidine residues. The second strand of the cDNA is synthesized by one of several methods, the more efficient of which are commonly known as "replacement synthesis" and "primed synthesis".
Replacement synthesis involves the use of ribonuclease H ("RNAase H"), which cleaves the phosphodiester backbone of RNA that is in a RNA:DNA hybrid leaving a 3' hydroxyl and a 5' phosphate, to produce nicks and gaps in the mRNA strand, creating a series of RNA primers that are used by E. coli DNA polymerase I, or its "Klenow" fragment, to synthesize the second strand of the cDNA. This reaction is very efficient; however, the cDNAs produced most often lack the 5' terminus of the mRNA sequence.
Primed synthesis to generate the second cDNA strand is a general name for several methods which are more difficult than replacement synthesis yet clone the 5' terminal sequences with high efficiency. In general, after the synthesis of the first cDNA strand, the 3' end of the cDNA strand is extended with terminal transferase, an enzyme which adds a homopolymeric "tail" of deoxynucleotides, most commonly deoxycytidylate. This tail is then hybridized to a primer of oligodeoxyguanidylate or a synthetic fragment of DNA with an deoxyguanidylate tail and the second strand of the cDNA is synthesized using a DNA-dependent DNA polymerase.
The primed synthesis method is effective, but the method is laborious, and all resultant cDNA clones have a tract of deoxyguanidylate immediately upstream of the mRNA sequence. This deoxyguanidylate tract can interfere with transcription of the DNA in vitro or in vivo and can interfere with the sequencing of the mRNA sequence by the Sanger dideoxynucleotide sequencing method.
Once both cDNA strands have been synthesized, the cDNA library is constructed by cloning the cDNAs into an appropriate plasmid or viral vector. In practice this can be done by directly ligating the blunt ends of the cDNAs into a vector which has been digested by a restriction endonuclease to produce blunt ends. Blunt end ligations are very inefficient, however, and this is not a common method of choice. A generally used method involves adding synthetic linkers or adapters containing restriction endonuclease recognition sequences to the ends of the cDNAs. The cDNAs can then be cloned into the desired vector at a greater efficiency.
Once a comprehensive cDNA library is constructed from a cell line, desired genes can be identified with the assistance of subtractive hybridization (see for example Sargent T. D., 1987, Meth. Enzymol., Vol. 152, pp. 423-432; Lee et al., 1991, Proc. Natl. Acad. Sci., USA, Vol. 88, pp. 2825-2830). A general method for subtractive hybridization is as follows. The complementary strand of the cDNA is synthesized and radiolabelled. This single strand of cDNA can be made from polyA mRNA or from the existing cDNA library. The radiolabelled cDNA is hybridized to a large excess of mRNA from a closely related cell population. After hybridization the cDNA:mRNA hybrids are removed from the solution by chromatography on a hydroxylapatite column. The remaining "subtracted" radiolabelled cDNA can then be used to screen a cDNA or genomic DNA library of the same cell population.
Subtractive hybridization removes the majority of the genes expressed in both cell populations and thus enriches for genes which are present only in the desired cell population. However, if the expression of a particular mRNA sequence is only a few times more abundant in the desired cell population than the subtractive population it may not be possible to isolate the gene by subtractive hybridization.