Proteins are essential for the control and execution of virtually every biological process. The rate of synthesis and the half-life of proteins and thus their expression level are also controlled post-transcriptionally. Furthermore, the activity of proteins is frequently modulated by post-translational modifications, in particular protein phosphorylation, and dependent on the association of the protein with other molecules including DNA and proteins. Neither the level of expression nor the state of activity of proteins is directly apparent from the gene sequence or even from the expression level of the corresponding mRNA transcript. A complete description of a biological system must therefor include measurements that indicate the identity, quantity and the state of activity of the proteins which constitute the system. The large-scale (ultimately global) analysis of proteins expressed in a cell or tissue has been termed proteome analysis (Pennington, S. R., Wilkins, M. R., Hochstrasser, D. F., and Dunn, M. J. (1997), “Proteome analysis: From protein characterization to biological function,” Trends Cell Bio. 7:168-173).
At present no protein analytical technology approaches the throughput and level of automation of genomic technology. The most common implementation of proteome analysis is based on the separation of complex protein samples, most commonly by two-dimensional gel electrophoresis (2DE), and the subsequent sequential identification of the separated protein species (Ducret, A. et al. (1998), “High throughput protein characterization by automated reverse-phase chromatography/electrospray tandem mass spectrometry,” Prot. Sci. 7:706-719; Garrels, J. I. et al. (1997), “Proteome studies of Saccharomyces cerevisiae: identification and characterization of abundant proteins. Electrophoresis,” 18:1347-1360; Link, A. J. et al. (1997), “Identifying the major proteome components of Haemophilus influenzae type-strain NCTC 8143,” Electrophoresis 18:1314-1334; Shevchenko, A. et al. (1996), “Linking genome and proteome by mass spectrometry: large-scale identification of yeast proteins from two dimensional gels,” Proc. Natl. Acad. Sci. U.S.A. 93:14440-14445; Gygi, S. P. et al. (1999), “Correlation between protein and mRNA abundance in yeast,” Mol. Cell. Biol. 19:1720-1730; Boucherie, H. et al. (1996), “Two-dimensional gel protein database of Saccharomyces cerevisiae,” Electrophoresis 17:1683-1699).
The 2DE approach has been revolutionized by the development of powerful mass spectrometric techniques and computer algorithms which correlate protein and peptide mass spectral data with sequence databases and, thus, rapidly and conclusively identify proteins (Eng, J., McCormack, A., and Yates, J. I. (1994), “An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database,” J. Am. Soc. Mass Spectrom. 5:976-989; Mann, M., and Wilm, M. (1994), “Error-tolerant identification of peptides in sequence databases by peptide sequence tags,” Anal. Chem. 66:4390-4399; Yates, J. R. et al. (1995), “Method to correlate tandem mass spectra of modified peptides to amino acid sequences in the protein database,” Anal. Chem. 67:1426-1436).
This technology has reached a level of sensitivity which now permits the identification of essentially any protein which is detectable by conventional protein staining methods including silver staining (Figeys, D., and Aebersold, R. (1998), “High sensitivity analysis of proteins and peptides by capillary electrophoresis tandem mass spectrometry: Recent developments in technology and applications,” Electrophoresis 19:885-892.; Figeys, D. et al. (1998), “Electrophoresis combined with mass spectrometry techniques: Powerful tools for the analysis of proteins and proteomes,” Electrophoresis 19:1811-1818; Figeys, D. et al. (1997), “A microfabricated device for rapid protein identification by microelectrospray ion trap mass spectrometry,” Anal. Chem. 69:3153-3160; Figeys, D. et al. (1996), “Protein identification by solid phase microextraction-capillary zone electrophoresis-microelectrospray-tandem mass spectrometry,” Nature Biotech. 14:1579-1583; Shevchenko, A. et al. (1996), “Mass spectrometric sequencing of proteins silver-stained polyacrylamide gels,” Anal. Chem. 68:850-858). However, the sequential manner in which samples are processed limits the sample throughput. The most sensitive methods have been difficult to automate and low abundance proteins, such as regulatory proteins, escape detection without prior enrichment, thus effectively limiting the dynamic range of the technique. In the 2DE-based approach proteins are quantified by densitometry of stained spots in the 2DE gels.
The development of methods and instrumentation for automated, data-dependent electrospray ionization (ESI) tandem mass spectrometry (MSn) in conjunction with microcapillary liquid chromatography (μLC) and database searching has significantly increased the sensitivity and speed of the identification of gel-separated proteins. As an alternative to the 2DE/MSn approach to proteome analysis, the direct analysis by tandem mass spectrometry of peptide mixtures generated by the digestion of complex protein mixtures has been proposed (Dongr'e, A. R. et al. (1997), “Emerging tandem-mass-spectrometry techniques for the rapid identification of proteins,” Trends Biotechnol. 15:418-425). μLC-Ms/MS has also been used successfully for the large-scale identification of individual proteins directly from mixtures without gel electrophoretic separation (Link, J. et al. (1999), “Direct analysis of large protein complexes using mass spectrometry,” Nat. Biotech.17:676-682; Opiteck, G. J. et al. (1997), “Comprehensive on-line LC/LC/MS of proteins,” Anal. Chem. 69:1518-1524.)
While these approaches dramatically accelerate protein identification, the quantities of the analyzed proteins cannot be easily determined due to the observation that mass spectrometers are inherently not quantitative devices. Direct mass spectrometric analysis of protein mixtures by mass spectrometry can be made quantitative by the application of stable isotope dilution theory, whereby two chemically identical analytes (one representing an internal standard and the sample to be measured) are labeled with stable isotope tags of identical chemical composition but different mass. This principle has been implemented in quantitative proteome analysis by the development of a class of chemical reagents termed isotope coded affinity tags (ICAT). (Gygi, S. P. et al. (1999), “Quantitative analysis of complex protein mixtures using isotope-coded affinity tags,” Nat. Biotechnol. 17, 994-999.) ICAT reagents and their application to the analysis of complex protein mixtures have been shown to substantially alleviate the dynamic range problem encountered by the 2DE/Msn approach.
Protein phosphorylation is one of the most important regulatory events in cells. The state of activity of numerous enzymes and processes and the association of specific proteins into functional complexes are frequently controlled by reversible protein phosphorylation (Graves, J. D. & Krebs, E. D. (1999), “Protein phosphorylation and signal transduction,” Pharmacol. Ther. 82, 111-121; Koch, C. A. et al. (1991),” SH2 and SH3 domains: elements that control interactions of cytoplasmic signaling proteins,” Science 252, 668-674; Hunter, T. (1994), “1001 protein kinases redux-towards 2000,” Semin. Cell Biol. 5, 367-376). The principle goals of studying protein phosphorylation are the identification, quantitation and determination of the biological function of phosphorylation site(s) in phosphoproteins. Much of the difficulty in such studies lies in the fact that many phosphoproteins exist only at very low abundance. Further, proteins are often phosphorylated at a low stoichiometry and at multiple sites. Therefore, it is usually difficult to obtain sufficient amounts of pure phosphoprotein for such analyses. All current methods for the analysis of the phosphorylation state of proteins focus on one purified phosphoprotein at a time (Verma, R. et al. (1997), “Phosphorylation of Siclp by Gl Cdk required for its degradation and entry into S phase,” Science 278, 455-60; Watts, J. D. et al. (1994), “Identification by electrospray ionization mass spectrometry of the sites of tyrosine phosphorylation induced in activated Jurkat T cells on the protein tyrosine kinase ZAP-70,” J. Biol. Chem. 269, 29520-29529; Gingras, A. C. et al. (1999), “Regulation of 4E-BP1 phosphorylation: a novel two-step mechanism,” Genes Dev. 13, 1422-1437). Because cellular proteins are coordinately phosphorylated to control specific biological processes, the complex mechanisms that control biological systems by protein phosphorylation are difficult to investigate using current technology.
Because phosphopeptide(s) typically are infrequent and of low abundance in protein digests, highly purified or enriched phosphopeptide samples are needed for mass spectrometric analysis. The need to selectively enrich for phosphopeptides prior to MS analysis is particularly urgent if a protein mixture rather than a single purified phosphoprotein is being analyzed. In addition, no MS-based method to quantify protein phosphorylation directly is currently available. Quantitative study of protein phosphorylation often involves methods such as 32P radiolabeling (Oda, Y. et al. (1999), “Accurate quantitation of protein expression and site-specific phosphorylation,” Proc. Natl. Acad. Sci. USA 96:6591-6596). Therefore, an MS-based method that allows both the identification of the sites of phosphorylation from complex mixtures of proteins and their quantitation will be an essential part of proteome analysis.
Thus, there is a substantial need in the art for a more rapid and general method for the analysis of protein phosphorylation, particularly in complex protein mixtures, that does not require purification to homogeneity of individual phosphoproteins. The present invention provides such a method.