1. Field of the Invention This invention relates to epitope tagging, in particular, to improved epitope tags, the nucleotide sequences that encode them, methods for using the nucleotide sequences and tags, and resulting cellular and multicellular products. 2. Background Art The publications and other reference materials referred to herein to describe the background of the invention and to provide additional detail regarding its practice are hereby incorporated by reference. For convenience, the reference materials are numerically referenced and grouped in the appended bibliography.
Epitope tagging is a recombinant DNA method for introducing immunoreactive peptides into the products of cloned genes (1-7). In particular, a DNA sequence encoding a sequence of amino acids that comprises a continuous epitope is inserted into the coding sequence of a cloned gene with the result that when the gene is expressed the protein of the gene is tagged with the epitope. The protein can then be detected and/or purified by virtue of its interaction with an antibody specific to the epitope. Epitope tags are typically 5-20 amino acids in length. Nucleotide sequences encoding the epitope are produced either by cloning appropriate portions of natural genes or by synthesizing a polynucleotide that encodes the epitope.
Epitope tagging is widely used for detecting, characterizing, and purifying proteins. The technique offers several advantages over alternative methods of detecting and purifying proteins. The small size of the epitope tag, which is usually 5-20 amino acids in length, generally has no effect on the biological function of the tagged protein. This contrasts with many larger fusion protein products, in which the activity or function of the fusion protein has been affected by the longer peptide label. Epitope tagging also offers tremendous time savings over the traditional method of producing an antibody to the specific protein being studied.
Epitope tagging involves adding a unique epitope tag peptide sequence to the protein of interest by recombinant DNA techniques, creating a fusion protein. The resulting tagged protein can then be detected by and purified with an antibody specific for the epitope tag.
Epitope tagging methods have been used in a wide variety of applications, including western blot analysis, immunoprecipitation, immunofluorescence, and immunoaffinity purification of tagged proteins.
Epitope tagging was first described in 1984 by Munro and Pelham (1). A cDNA encoding the Drosophila melanogaster heat shock protein hsp70 was tagged at the 3' end of the coding sequence with a short oligonucleotide tag encoding either nine or fourteen amino acids of the peptide Substance P. After transfection of monkey COS cells, the tagged protein was detected using an anti-substance P monoclonal antibody. Since the initial report of Munro and Pelham, hundreds of investigations using epitope tagging have been reported in the scientific literature. Epitope tagging products and kits, which include various combinations of peptides, polynucleotides, and antibodies, are currently sold by a number of companies, including Boehringer-Mannheim, Indianapolis Ind.; Berkeley Antibody Company, Berkeley, Calif.; MBL International Corporation, Watertown, Mass.; Novagen, Madison Wis.; IBI, West Haven, Conn. and Life Technologies, Gaithersburg, Md.
To epitope tag a protein by conventional means, one begins with two DNA molecules: (1) a polynucleotide which is cloned in a plasmid vector and which includes a sequence of nucleotides encoding the protein as well as regulatory sequences (i.e. promoter, translations start, etc.) needed to express the protein; and (2) an oligonucleotide encoding the epitope with which the protein is to be tagged. The oligonucleotide is designed to encode, in one of its reading frames, an epitope recognized by a known antibody. One chooses a site in the polynucleotide's protein coding sequence for insertion of the oligonucleotide. The site may be at or near the 3' or the 5' end of the coding sequence, or somewhere in between the 3' and 5' ends. The insertion site for the oligonucleotide is typically a unique restriction site. The plasmid is linearized with the restriction endonuclease, and the oligonucleotide is ligated into the site. The tagged gene is then introduced into living cells. Epitope-tagged protein, which is subsequently expressed from the tagged gene, is detected and/or purified by immunochemical means.
Using conventional epitope tagging techniques, hundreds of different proteins have been epitope-tagged with numerous distinct peptides, including the ten amino acid c-myc epitope Glu Gln Lys leu Ile Ser Glu Asp Leu (SEQ ID NO: 1) derived from the human c-myc protein (8)); the nine amino acid HA-epitope Tyr pro Tyr Pro Asp Val Tyr Ala (SEQ ID NO: 2) derived from influenza virus hemagglutinin (9, 10), the eight amino acid FLAG epitope Asp Tyr Lys Asp Asp Asp Asp Lys (SEQ ID NO: 3) derived from bacteriophage T7 (Castrucci et al., 1992. J. Virology 66: 4647-4653) and the eleven amino acid epsilon-tag epitope Lys Gly Phe Ser Tyr Phe Gly Glu Asp Leu Met Pro (SEQ ID NO: 4) derived from protein kinase C epsilon (Olah et al., 1994. Anal. Biochem. 221: 94-102). Indeed, there appears to be no practical limit to the number of possible epitope tags that can exist. Essentially any peptide can be used as an immunogen to raise antibodies that will recognize that same peptide when it is present within or at the termini of a protein (11, 12).
It is common practice in molecular biology to obtain antibodies that recognize the protein product of a cloned and sequenced gene by (1) synthesizing a peptide, typically ten to twenty amino acids in length, that corresponds to a portion of the protein, (2) immunizing an animal with the peptide, and (3) using the resulting antiserum to immunodetect or immunopurify the protein in which the peptide is situated. An example of this approach can be found in Sawin (15). A particularly relevant example can be found in Sugii et al. (13). Here, 23 overlapping peptides that cover the entire amino acid sequence of bovine conglutinin were synthesized and used individually as peptide epitopes to immunize rabbits. Every serum showed cross-reactivity with the complete conglutinin protein.
A problem with conventional epitope tagging involves a limited probability of successfully tagging the protein. Despite researchers' best efforts, not every insertion into a host polynucleotide of an oligonucleotide encoding an epitope tag is achieved in a reading frame which allows expression of the intended epitope. The probability of success using a conventional method depends, in part, on how much is known about the polynucleotide before the construction is commenced. If the nucleotide sequence is known, and if, therefore, the reading frame at the target restriction site is known, then an oligonucleotide with the epitope encoded in the correct reading frame can be chosen. In this case, the probability that a given insertion event will be the desired one is one in two for the reason that the orientation of the oligonucleotide with respect to the polynucleotide cannot be controlled by the experimenter, and only one of the two orientations will serve. If, on the other hand, the reading frame at the target restriction site is not known (as is frequently the case), then the probability of success drops to one in six because the reading frame will only be correct for one site out of three. The reading frame problem could be dealt with by using three different DNA fragments, each of which encodes the epitope tag in a different reading frame (16). However, that involves production of multiple constructs to assure finding the one of interest, which is an inefficient process.
Accordingly, for known epitope tagging procedures to be effective, the added DNA must be (1) in the appropriate orientation, and (2) in the correct reading frame. There are thus two obstacles inherent in conventional epitope tagging: an orientation obstacle and a reading frame obstacle.
The reading frame obstacle can only be avoided if the reading frame around the target restriction site is known. Otherwise, three different DNA fragments, each of which encodes the epitope tag in a different reading frame must be used. In particular, if the insertion into the coding sequence is at a random or arbitrarily selected site, e.g. at a unique restriction site, then for a given epitope-encoding oligonucleotide, the maximum likelihood that it is possible to successfully epitope-tag the gene product by insertion of the oligonucleotide at that site is only one in three (due to the reading frame obstacle). The experimenter is forced to isolate multiple insertions at the target site and test them individually in order to find the one of interest. The test may be arduous. For example, if the gene of interest is to be assayed in transgenic animals, it would be necessary to make numerous transgenic constructs and examine them individually.
In summary, when the reading frame of the target restriction site is not known, the likelihood that a particular insertion will successfully tag the protein is only one in six (due to the reading frame obstacle and the orientation obstacle). In other words, in five tries out of six the experimenter will fail, and in two cases out of three the experimenter is destined to fail.