1. Field of Invention
This invention relates to the fields of Molecular Biology and Molecular Genetics with specific reference to the identification and isolation of proteins and of the genes and transcripts that encode them.
2. Description of Prior Art
The primary area of the invention--the identification and tagging of genes and proteins--has received a great deal of attention, and many successful methods have been devised. None of these methods, however, has the feature of tagging gene, transcript and protein in a single event.
Linkage analysis. Genes have traditionally been identified by identifying mutation sand then mapping them with respect to one another by means of genetic crosses. This kind of mapping, or linkage analysis, does not serve to isolate the genes themselves nor does it indicate anything about the genes' molecular structure or function. In recent years a form of linkage analysis using restriction fragment length polymorphisms (RFLPs) has come into use (1). This method serves to identify DNA sequences that are linked to a gene of interest, and, having identified such a DNA sequence, it is possible in principle, and sometimes in practice, to identify and clone the gene itself by performing chromosome walks or jumps (2). It should be stressed that, even when successful, this strategy identifies the gene, not the protein encoded by the gene.
Transposon Tagging. Another technique for cloning genes that has been developed relatively recently goes by the name transposon tagging. In this technique (3), mutations due to the insertion of transposable elements into new sites in the genome are identified, and the genes in which the transposons lie can then cloned using transposon DNA as a molecular probe. Transposon tagging, like RFLP/linkage analysis, identifies genes, not proteins.
Enhancer trapping. Another method for identifying genes, enhancer trapping (4), involves the random insertion into a eucaryotic genome of a promoter-less foreign gene (the reporter) whose expression can be detected at the cellular level. Expression of the reporter gene indicates that it has been fused to an active transcription unit or that it has inserted into the genome in proximity to cis-acting elements that promote transcription. This approach has been important in identifying genes that are expressed in a cell type-specific or developmental stage-specific manner. Enhancer trapping, like like RFLP/linkage analysis and transposon tagging, identifies genes, not proteins, and it does not directly reveal anything about the nature of the protein product of a gene.
Guest Peptides and Epitope tagging. A number of studies have been performed in which new peptides have been inserted into proteins at a variety of positions by modifying the genes encoding the proteins using recombinant DNA technology. The term `guest peptide` has been used to describe the foreign peptides in these cases. It is clear that in many cases the presence of such peptides is relatively innocuous and does not substantially compromise protein function--especially in those cases where the peptide is on the surface of the protein rather than in its hydrophobic core.
Epitope tagging (5) is a method that utilizes antibodies against guest peptides to study protein localization at the cellular level and subcellular levels. Epitope tagging begins with a cloned gene and an antibody that recognizes a known peptide (the epitope). Using recombinant DNA technology, a sequence of nucleotides encoding the epitope is inserted into the coding region of the cloned gene, and the hybrid gene is introduced into a cell by a method such as transformation. When the hybrid gene is expressed the result is a chimeric protein containing the epitope as a guest peptide. If the epitope is exposed on the surface of the protein, it is available for recognition by the epitope-specific antibody, allowing the investigator to observe the protein within the cell using immunofluorescence or other immunolocalization techniques. Epitope tagging serves to mark proteins of already-cloned genes but does not serve to identify genes.
Isolating Genes Beginning with the Proteins they Encode. A number of procedures have been developed for isolating genes beginning with the proteins that they encode. Some, such as expression library screening (6), involve the use of specific antibodies that react to the protein of interest. Others involve sequencing all or part of the protein and designing oligonucleotide probes that can be used to identify the gene by DNA/DNA hybridization. In all of these cases, one must have specific knowledge about a protein before it is possible to take steps to clone and characterize the gene that encodes it.
cDNA Cloning and Sequencing. A method of gene identification that has received a great deal of attention in the recent past is the cloning (and in many instances, sequencing) of so-called expressed sequence tags (ESTs) from cDNA libraries made from mRNA extracted from a given tissue or cell type (7). Information about the proteins encoded by the mRNAs can be derived from the cDNA sequences by identifying and analyzing their open reading frames. In many cases such cDNAs are not full length, however, and so information about the amino-terminal portion of the protein is lacking. And, more significantly, the method tags transcript sequences and not the proteins that the transcripts encode.
RNA splicing. RNA splicing is the natural phenomenon, characteristic of all eucaryotic cells, whereby introns are removed from primary RNA transcripts. A large body of research has revealed that an intron is functionally defined by three components--a 5' donor site, a branch site and a 3' acceptor site (8). If these sites are present, and if the intron is not too large (it can be at least as large as 2 kb in many organisms), and if the distance between the branch and 3' acceptor sites is appropriate, the cellular splicing machinery is activated and the intron is removed from the transcript. Many different natural DNA sequences are known to have splice site function; consensus sites for mammalian splicing are indicated in FIG. 1 below. Thus not only have many active splice sites been cloned, but there is a large database that can be used to design synthetic functional splice site sequences.
Gene Trapping. Gene trapping is a method used to identify transcribed genes. Gene trapping vectors carry splice acceptor sites directly upstream of the coding sequence for a reporter protein such as .beta.-galactosidase. When the vector inserts into an intron of an actively transcribed gene, the result is a protein fusion between an N-terminal fragment of the target gene-product and the reporter protein, the activity of which is used as an indicator that integration into an active gene has occurred (9). Gene trapping seeks to identify transcribed genes--not to tag proteins, and to inactivate genes--not to produce an active tagged gene product.