Historically, techniques such as Edman degradation have been extensively used for protein sequencing. See, Stark, in: Methods in Enzymology, 25:103-120 (1972); Niall, in: Methods in Enzymology, 27:942-1011 (1973); Gray, in: Methods in Enzymology, 25:121-137 (1972); Schroeder, in: Methods in Enzymology, 25:138-143 (1972); Creighton, Proteins: Structures and Molecular Principles (W. H. Freeman, NY, 1984); Niederwieser, in: Methods in Enzymology, 25:60-99 (1972); and Thiede, et al. FEBS Lett., 357:65-69 (1995). However, sequencing by collision-induced dissociation mass spectrometry (MS) methods (MS/MS sequencing) has rapidly evolved and has proved to be faster and require less protein than Edman techniques. See, Shevchenko, A., et al., Proc. Natl. Acad. Sci. (USA), 93:14440-14445 (1996); Wilm, et al., Nature, 379:466-469 (1996); Mark, J., xe2x80x9cProtein structure and identification with MS/MS,xe2x80x9d paper presented at the PE/Sciex Seminar Series, Protein Characterization and Proteomics: Automated high throughput technologies for drug discovery, Foster City, Calif. (March, 1998); and Bieman, Methods in Enzymology, 193:455-479 (1990).
MS sequencing is accomplished either by using higher voltages in the ionization zone of the MS to randomly fragment a single peptide isolated from a protein digest, or more typically by tandem MS using collision-induced dissociation in the ion trap. See, Bieman, ibid. Several techniques can be used to select the peptide fragment used for MS/MS sequencing, including accumulation of the parent peptide fragment ion in the quadrapole MS unit (see, Mark, J. ibid.; Mann, M., paper presented at the IBC Proteomics conference, Boston, Mass. (Nov. 10-11, 1997); and Bieman, Methods in Enzymology, 193:455-479 (1990)), capillary electrophoretic separation coupled to ES-TOF MS detection (see, Aebersold, R. xe2x80x9cProteome analysis: Biological assay or data archive?,xe2x80x9d paper presented at the IBC Proteomics conference, Coronado, Calif. (Jun. 11-12, 1998) and Smith, et al., in: CRC Handbook of Capillary Electrophoresis: A Practical Approach, Chp. 8, pgs 185-206 (CRC Press, Boca Raton, Fla., 1994)), or other liquid chromatographic separations (Niall, H. D., in: Methods in Enzymology, 27:942-1011 (1973) and Creighton, T. E., Proteins: Structures and Molecular Principles (W. H. Freeman, NY, 1984)). The amino acid sequence of the peptide is deduced from the molecular weight differences observed in the resulting MS fragmentation pattern of the peptide using the published masses associated with individual amino acid residues in the MS (Biemann, K., in: Methods in Enzymology., 193:888 (1990), and has been codified into a semi-autonomous peptide sequencing algorithm (Hines, et al., J Am Soc Mass Spectrom, 3:326-336 (1992)).
For example, in the mass spectrum of a 1425.7 Da peptide (HSDAVFTDNYTR) isolated in an MS/MS experiment acquired in positive ion mode, the difference between the full peptide 1425.7 Da and the next largest mass fragment (y11, 1288.7 Da) is 137 Da. This corresponds to the expected mass of an N-terminal histidine residue that is cleaved at the amide bond. For this peptide, complete sequencing is possible as a result of the generation of high-abundance fragment ions that correspond to cleavage of the peptide at almost every residue along the peptide backbone. In the above-recited peptide sequence, the generation of an essentially complete set of positively-charged fragment ions that includes either end of the peptide is a result of the basicity of both the N- and C-terminal residues. When a basic residue is located at the N-terminus and/or C-terminus, most of the ions produced in the collision induced dissociation (CID) spectrum will contain that residue (see, Zaia, J., in: Protein and Peptide Analysis by Mass Spectrometry, J. R. Chapman, ed., pp. 29-41, Humana Press, Totowa, N.J., 1996; and Johnson, R. S., et al., Mass Spectrom. Ion Processes, 86:137-154 (1988)). since positive charge is generally localized at the basic site. The presence of a basic residue typically simplifies the resulting spectrum, since a basic site directs the fragmentation into a limited series of specific daughter ions. Peptides that lack basic residues tend to fragment into a more complex mixture of fragment ions that makes sequence determination more difficult.
Extending the concept of simplifying the CID spectrum of a peptide by including a charge concentrating moiety on either terminus of the peptide, others have demonstrated that attaching a hard positive charge to the N-terminus directs the production of a complete series of N-terminal fragment ions from a parent peptide in CID experiments regardless of the presence or absence of a basic residue at the N-terminus. See, Johnson, R. S., et al., Mass Spectrom. Ion Processes, 86:137-154 (1988); Vath, J. E., et al., Fresnius Z Anal. Chem., 331:248-252 (1988); Stults, J. T., et al., Anal. Chem., 65:1703-1708 (1993); Zaia, J., et al., J. Am. Soc. Mass Spectrom., 6:423-436 (1995); Wagner, D. S., et al., Biol. Mass Spectrom., 20:419-425 (1991); and Huang, Z. -H., et al., Anal. Biochem., 268:305-317 (1999). Theoretically, all fragment ions are produced by charge-remote fragmentation that is directed by the fixed-charged group. See, Tomer, K. B., et al., J. Am. Chem. Soc., 105:5487-5488 (1983).
Peptides have been labeled with several classes of fixed-charge groups, including dimethylalkylammonium, substituted pyridinium, quaternary phosphonium, and sulfonium derivatives. Characteristics of useful labels include, ease of synthesis, increase in ionization efficiency of labeled peptides, and formation from a labeled peptide of a specific fragment ion series with minimal unfavorable label fragmentation. Zaia (in: Protein and Peptide Analysis by Mass Spectrometry, J. R. Chapman, ed., pp. 29-41, Humana Press, Totowa, N.J., 1996) reported that the labels satisfying these criteria include those of the dimethylalkylammonium class and quaternary phosphonium derivatives. Moreover, it has been reported that substituted pyridinium derivatives are useful in high-energy CID. See, Bures, E. J., et al., Anal. Biochem., 224:364-372 (1995) and Aebersold, R., et al., in: Protein Science, pp. 494-503 (Cambridge University Press, 1992).
Despite some progress in analytical methodology, protein identification remains a major bottleneck in field of proteomics. For example, it can require up to 18 hours to generate a protein sequence tag of sufficient length to allow the identification of a single purified protein from its predicted genomic sequence. Shevchenko, A., et al., Proc. Natl. Acad. Sci. (USA), 93:14440-14445 (1996). Moreover, although unambiguous protein identification can be attained by generating a protein sequence tag (PST, see Clauser, K. R., et al., Proc. Natl. Acad. Sci. (USA), 92:5072-5076 (1995) and Li, G., M., et al., Electrophoresis, 18:391-402 (1997)), limitations on the ionization efficiency of larger peptides and proteins restrict the intrinsic detection sensitivity of MS techniques and inhibit the use of MS for the identification of low abundance proteins. Furthermore, limitations on the mass accuracy of time of flight (TOF) detectors can also constrain the usefulness of presently utilized methods of MS/MS sequencing, requiring that proteins be digested by proteolytic and/or chemolytic means into more manageable peptides (see Ambler, R. P., in: Methods in Enzymology, 25:143-154 (1972) and Gross, E., in: Methods in Enzymol., 11:238-255 (1967) prior to sequencing.
Two basic strategies have been proposed for the MS identification of proteins after their separation from a protein mixture: 1) mass profile fingerprinting (xe2x80x98MS fingerprintingxe2x80x99) (see, James, P., et al., Biochem. Biophys. Res. Commun., 195:58-64 (1993) and Yates, J. R., et al., Anal. Biochem., 214:397-408 (1993)); and 2) sequencing of one or more peptide domains by MS/MS (xe2x80x98MS/MS sequencingxe2x80x99)(see Mann, M., paper presented at the IBC Proteomics conference, Boston, Mass. (Nov. 10-11, 1997); Wilm, M., et al., Nature, 379:466-469 (1996); and Chait, B. T, et al., Science, 262:89-92 (1993)). MS fingerprinting is achieved by accurately measuring the masses of several peptides generated by a proteolytic digest of the intact protein and searching a database for a known protein with that peptide mass fingerprint. MS/MS sequencing involves actual determination of one or more PSTs of the protein by generation of sequence-specific fragmentation ions in the quadrapole of an MS/MS instrument.
Clauser et al., Proc. Natl. Acad. Sci. (USA), 92:5072-5076 (1995) have suggested that proteins can only be unambiguously identified through the determination of PSTs that allow reference to the theoretical sequences determined from genomic databases. Li et al., Electrophoresis, 18:391-402 (1997) appear to have proven this assertion by finding that the reliable identification of individual proteins by MS fingerprinting degenerated as the size of the comparative theoretical peptide mass database increased. Li et al., ibid., also reported that they were only able to obtain peptide maps for the highest abundance proteins in the gel because of sensitivity limitations of the MS, even though their matrix assisted laser desorption MALDI methodology was demonstrated to improve the detection sensitivity over previously reported methods. Clearly, rapid and cost effective protein sequencing techniques will improve the speed and lower the cost of proteomics research.
The present invention provides such methods.
The present invention overcomes many of the difficulties associated with current MS-based protein sequencing technologies, including, for example, ionization inefficiency and inaccuracies in fragment mass. Because the methods of the invention preferably eliminate the need for proteolytic or chemolytic digestion of the protein, the present methods provide protein sequencing times that are significantly reduced from the times obtainable using prior methods. Moreover, because the proteins being sequenced are highly fragmented using the present methods, the ionization efficiency and the volatility of the resulting fragments are higher than those of the parent protein, thus leading to a detection sensitivity that is improved over prior methods.
Thus, in one aspect, the present invention provides a method for sequencing a portion of a protein, comprising:
(a) contacting a protein with a C-terminus or N-terminus labeling moiety to covalently attach a label to the C- or N-terminus of the protein and form a labeled protein; and
(b) analyzing the labeled protein using a mass spectrometric fragmentation method to determine the sequence of at least the two C-terminus or two N-terminus residues.
In one group of embodiments, the method further comprises:
(c) identifying the protein by using the sequence of the at least two C-terminus or two N-terminus residues to search predicted protein sequences from a database of gene sequence data.
In another aspect, the present invention provides a method for sequencing a portion of a protein in a protein mixture, the method comprising:
(a) contacting the protein mixture with a C-terminus or N-terminus labeling moiety to covalently attach a label to the C- or N-terminus of the protein and form a labeled protein mixture;
(b) separating individual labeled proteins in the labeled protein mixture; and
(c) analyzing the labeled proteins from step (b) by a mass spectrometric method to determine the sequence of at least two C-terminus or two N-terminus residues.
In one group of embodiments, the method further comprises:
(d) identifying the protein by using the sequence of at least two C-terminus or two N-terminus residues in combination with a separation coordinate of the labeled protein and the protein terminus location of the sequence to search predicted protein sequences from a database of gene sequence data.
In each of the methods above, the use of nonproteolytic protein sequencing by in-source fragmentation provides advantages over conventional MS/MS sequencing approaches. One particular advantage is time savings due to elimination of protein digestion steps and elimination of the need to accumulate low volatility peptide ions in the quadrapole. Another advantage is that fewer sequence ambiguities result due to the improved absolute mass accuracy gained by working at the low end of the mass spectrum. Another advantage is that better ionization efficiency and corresponding detection sensitivity result from using more energetic ionization conditions and the introduction of a hard or ionizable charge on the fragments through the addition of the label. Yet another advantage of introducing a charge through the label is the ability to determine partial protein sequences from regions of a protein that may not contain ionizable amino acid residues.
Finally, this method provides a contiguous protein sequence tag (PST) that can be used both for unambiguous protein identification or to generate an N- or C-terminal nucleic acid probe useful for isolating the corresponding cDNA from native cell or tissue samples.