The present invention relates to dual-tagged recombinant fusion proteins. The fusion proteins of the invention are conveniently isolated and quantified, particularly in tissues from genetically modified animals.
Methods for isolating and/or detecting recombinant proteins of interest are useful in a number of applications. For instance, sensitive detection of transgene products in genetically engineered animals is important in determining the tissues in which transgene expression occurs. The proteins can be detected using a binding ligand (e.g., an antibody) that specifically recognizes the desired protein. In most cases, this procedure requires raising antibodies that are specifically immunoreactive with the desired protein. To avoid this requirement, various tags which can be fused to the protein of interest have been developed. For instance, the tags may include a unique epitope for which antibodies are readily available. Other methods include use of tags which incorporate metal-chelating amino acids.
Single epitope tags and other related tags do not necessarily provide sufficient sensitivity to allow detection of transgene products in tissues of animals, however. Thus, what is needed in the art are more sensitive methods of detecting recombinant proteins in vivo and in vitro that are fast, cheap, and easy to carry out. The present invention provides these and other advantages.
The present invention provides fusion proteins and nucleic acids encoding them. The fusion proteins of the invention comprise a polypeptide sequence of interest, a capture tag sequence, and a detection tag sequence. The capture tag sequence and the detection tag sequence are each heterologous to the polypeptide sequence of interest. In some embodiments, either the capture tag sequence or the detection tag sequencecan be an 30 epitope tag. An exemplary capture tag sequence is DYKDDDDK (SEQ ID NO:1). AN exemplary detection tag sequence is YPYDVPDYA (SEQ ID NO:2).
The two tags can be positioned in a number of ways with respect to each other and the polypeptide sequence of interest. For instance, the capture tag sequence and the detection tag sequence can be positioned at the C terminus of the polypeptide sequence of interest. In addition, the fusion proteins of the invention can comprise linkers between the various components. For instance, the capture tag sequence and the detection tag sequence can be linked to each other through an oligopeptide linker. The linker may consist of less than about 15 amino acid residues, usually between about 4 and about 10 amino acids. The linker may comprise alanine residues. In some embodiments, the capture tag sequence, the detection tag sequence, or a combination thereof, can be linked to the polypeptide of interest through an oligopeptide linker.
The particular protein detected in the invention is not critical. In some embodiments the polypeptide of interest can be angiostatin.
The fusion proteins of the invention can be detected in an animal that comprises a nucleic acid molecule encoding the fusion protein. Typically, the assays of the invention include capturing the fusion protein in a sample from the animal with a compound that specifically binds the capture tag sequence; and then detecting the fusion protein in the sample with a second compound that specifically binds the detection tag sequence. The step of capturing can be carried out by contacting the sample with an antibody that specifically binds the capture tag sequence. The step of detecting can be carried out by contacting the sample with an antibody that specifically binds the detection tag sequence. The sample may be a tissue sample, such as lung tissue. The animal can be a transgenic or genetically engineered mouse.
As used herein a xe2x80x9ccapture tag sequencexe2x80x9d is a sequence of amino acid residues which can be used to isolate or remove a fusion protein of the invention from a complex mixture. Typically, a capture tag will be a sequence of amino acids that specifically binds a ligand (e.g., an antibody) and thus allows the fusion protein to be isolated from the mixture. Examples of various capture tags are set forth in detail below.
As used herein a xe2x80x9cdetection tag sequencexe2x80x9d is a sequence of amino acid residues which can be used to detect the presence of a fusion protein of the invention, once the protein is isolated using the capture tag sequence. Any of a number of means may be used to detect the detection tag. The detection tag sequence can be directly detected (e.g., by fluorescence) or indirectly detected using a detectable ligand that specifically binds the detection tag. Example of various detection tags are set forth in detail below.
A xe2x80x9cfusion proteinxe2x80x9d of the invention is a polypeptide sequence containing two, different heterologous sequences (e.g., a capture tag sequence and a detection tag sequence). The various components may be linked through linker sequences.
A polynucleotide or polypeptide sequence is xe2x80x9cheterologous toxe2x80x9d a second polynucleotide or polypeptide sequence if it is entirely synthetic, originates from a foreign species, or, if from the same species, is modified from its original form.
The phrases xe2x80x9cspecifically bindsxe2x80x9d refers to a binding reaction between a capture or detection ligand and an amino acid sequence, which binding is determinative of the presence of the amino acid sequence in the presence of a heterogeneous population of proteins and other biologics. Thus, under designated conditions, the capture or detection ligand binds preferentially to the particular sequence and does not bind in a significant amount to other amino acid sequences present in the sample. This interaction may also be referred to as xe2x80x9cspecifically immunoreactivexe2x80x9d , when referring to reaction between an epitope and an antibody (e.g., in the case of epitope tags).
The phrases xe2x80x9cnucleic acidxe2x80x9d or xe2x80x9cpolynucleotidexe2x80x9d refer to a single or double-stranded polymer of deoxyribonucleotide or ribonucleotide bases. It includes cDNA, self-replicating plasmids, infectious polymers of DNA or RNA and non-functional DNA or RNA.
Two nucleic acid sequences or polypeptides are said to be xe2x80x9cidenticalxe2x80x9d if the sequence of nucleotides or amino acid residues, respectively, in the two sequences is the same when aligned for maximum correspondence as described below.
Sequence comparisons between two (or more) polynucleotides or polypeptides are typically performed by comparing sequences of the two sequences over a xe2x80x9ccomparison windowxe2x80x9d to identify and compare local regions of sequence similarity. A xe2x80x9ccomparison windowxe2x80x9d, as used herein, refers to a segment of at least about 20 contiguous positions, usually about 50 to about 200, more usually about 100 to about 150 in which a sequence may be compared to a reference sequence of the same number of contiguous positions after the two sequences are optimally aligned.
Optimal alignment of sequences for comparison may be conducted by the local homology algorithm of Smith and Waterman Adv. Appl. Math. 2:482 (1981), by the homology alignment algorithm of Needleman and Wunsch J. Mol. Biol. 48:443 (1970), by the search for similarity method of Pearson and Lipman Proc. Natl. Acad. Sci. (U.S.A.) 85: 2444 (1988), by computerized implementations of these algorithms (GAP, BESTFIT, BLAST, FASTA, and TFASTA in the Wisconsin Genetics Software Package, Genetics Computer Group (GCG), 575 Science Dr., Madison, Wis.), or by inspection. xe2x80x9cPercentage of sequence identityxe2x80x9d is determined by comparing two optimally aligned sequences over a comparison window, wherein the portion of the polynucleotide sequence in the comparison window may comprise additions or deletions (i.e., gaps) as compared to the reference sequence (which does not comprise additions or deletions) for optimal alignment of the two sequences. The percentage is calculated by determining the number of positions at which the identical nucleic acid base or amino acid residue occurs in both sequences to yield the number of matched positions, dividing the number of matched positions by the total number of positions in the window of comparison and multiplying the result by 100 to yield the percentage of sequence identity.
The term xe2x80x9csubstantial identityxe2x80x9d of polynucleotide sequences means that a polynucleotide comprises a sequence that has at least 60% sequence identity, preferably at least 80%, more preferably at least 90% and most preferably at least 95%, compared to a reference sequence using the programs described above (preferably BLAST) using standard parameters. One of skill will recognize that these values can be appropriately adjusted to determine corresponding identity of proteins encoded by two nucleotide sequences by taking into account codon degeneracy, amino acid similarity, reading frame positioning and the like. Substantial identity of amino acid sequences for these purposes normally means sequence identity of at least 40%, preferably at least 60%, more preferably at least 90%, and most preferably at least 95%. Polypeptides which are xe2x80x9csubstantially similarxe2x80x9d share sequences as noted above except that residue positions which are not identical may differ by conservative amino acid changes. Conservative amino acid substitutions refer to the interchangeability of residues having similar side chains. For example, a group of amino acids having aliphatic side chains is glycine, alanine, valine, leucine, and isoleucine; a group of amino acids having aliphatic-hydroxyl side chains is serine and threonine; a group of amino acids having amide-containing side chains is asparagine and glutamine; a group of amino acids having aromatic side chains is phenylalanine, tyrosine, and tryptophan; a group of amino acids having basic side chains is lysine, arginine, and histidine; and a group of amino acids having sulfur-containing side chains is cysteine and methionine. Preferred conservative amino acids substitution groups are: valine-leucine-isoleucine, phenylalanine-tyrosine, lysine-arginine, alanine-valine, and asparagine-glutamine.
Another indication that nucleotide sequences are substantially identical is if two molecules hybridize to each other, or a third nucleic acid, under stringent conditions. Stringent conditions are sequence dependent and will be different in different circumstances. Generally, stringent conditions are selected to be about 5xc2x0 C. lower than the thermal melting point (Tm) for the specific sequence at a defined ionic strength and pH. The Tm is the temperature (under defined ionic strength and pH) at which 50% of the target sequence hybridizes to a perfectly matched probe. Typically, stringent conditions will be those in which the salt concentration is about 0.02 molar at pH 7 and the temperature is at least about 60xc2x0 C. Stringent conditions for a standard Southern hybridization will include at least one wash (usually 2) in 0.2xc3x97SSC at a temperature of at least about 50xc2x0 C., usually about 55xc2x0 C., for 20 minutes, or equivalent conditions.