The activation of proteins by modification represents an important cellular mechanism for regulating most aspects of biological organization and control, including growth, development, homeostasis, and cellular communication. For example, protein phosphorylation plays a critical role in the etiology of many pathological conditions and diseases, including cancer, developmental disorders, autoimmune diseases, and diabetes. In spite of the importance of protein modification, it is not yet well understood at the molecular level. The reasons for this lack of understanding are, first, that the cellular modification system is extraordinarily complex, and second, that the technology necessary to unravel its complexity has not yet been fully developed.
The complexity of protein modification on a proteome-wide scale derives from three factors: the large number of modifying proteins, e.g. kinases, encoded in the genome, the much larger number of sites on substrate proteins that are modified by these enzymes, and the dynamic nature of protein expression during growth, development, disease states, and aging. The human genome encodes, for example, over 520 different protein kinases, making them the most abundant class of enzymes known. See Hunter, Nature 411: 355–65 (2001). Each of these kinases phosphorylates specific serine, threonine, or tyrosine residues located within distinct amino acid sequences, or motifs, contained within different protein substrates. Most kinases phosphorylate many different proteins: it is estimated that one-third of all proteins encoded by the human genome are phosphorylated, and many are phosphorylated at multiple sites by different kinases. See Graves et al., Pharmacol. Ther. 82: 111–21 (1999).
Many of these phosphorylation sites regulate critical biological processes and may prove to be important diagnostic or therapeutic targets for molecular medicine. For example, of the more than 100 dominant oncogenes identified to date, 46 are protein kinases. See Hunter, supra. Oncogenic kinases such as ErbB2 and Jak3, widely expressed in breast tumors and various leukemias, respectively, transform cells to the oncogenic phenotype at least in part because of their ability to phosphorylate cellular proteins. Understanding which proteins are modified by these kinases will greatly expand our understanding of the molecular mechanisms underlying, e.g., oncogenic transformation. Thus, the ability to selectively identify modification sites, e.g. phosphorylation sites, on a wide variety of cellular proteins represents an important new tool for understanding the key signaling proteins and pathways implicated in diseases, such as cancer.
Although several methods for purifying phosphopeptides have been described, these methods have significant limitations that render them unsuitable for the isolation or purification of modified peptides from complex mixtures of peptides on a genome- or cell-wide basis. In one method, which employs reversed-phase HPLC, proteins are labeled in vivo or in vitro with radioactive phosphate, and the protein of interest is purified to near homogeneity (so that it represents at least 95% of the protein in the sample) before analysis. See, e.g. Wettenhall et al. Methods Enzymol. 201: 186–199 (1991). The highly purified protein is then digested with a proteolytic enzyme to produce peptides, and the radioactively labeled peptides containing a phosphorylation site of the single protein are purified by reversed-phase HPLC. Phosphorylated peptides are distinguished from nonphosphorylated peptides by measuring the radioactivity associated with each HPLC fraction, and then chemically sequenced.
The reversed-phase HPLC method has several important limitations that render it unsuitable for the purification of modified peptides from complex mixtures of peptides, e.g. cellular digests. The method cannot be applied to biological samples that cannot be radioactively labeled, such as tissue biopsy samples. Selective peptide loss during purification by this method can introduce biases, so that the most prominent modified peptide before and after the HPLC step is not necessarily the same. This problem is addressed by first purifying the protein so its level of radioactivity can be measured and then rigorously accounting for sample recovery during all subsequent purification and analysis steps. Accordingly, modified sites cannot be identified from complex peptide mixtures. The HPLC method is often unsuccessful when applied to proteins that are, modified at low levels, for example, where only a small percentage (less than 10%) of the protein is phosphorylated at one site. This problem results from the difficulty of purifying a phosphopeptide to homogeneity against a high background of nonphosphorylated peptides, and the need for a nearly homogenous phosphopeptide during chemical sequencing. Additional shortcomings of this method exist.
Several researchers have employed immobilized phospho-specific antibodies, along with mass spectrometry (MS or MS/MS), to identify phosphorylation sites in proteins. Immobilized anti-phosphotyrosine antibodies have been used to purify phosphopeptides from digests of gelsolin, an actin binding-protein. See De Corte, et al., Prot. Sci. 8: 234–241 (1999). However the single protein of interest, gelsolin, was first purified and phosphorylated in vitro, before digesting to yield gelsolin-specific phosphopeptides. Immobilized anti-phosphotyrosine antibodies have similarly been employed to identify EphB phosphopeptides from purified EphB digests (Kalo et al., Biochem. 38: 14396–408 (1999)) and to purify alpha-enolase phosphopeptides from a purified digest of human alpha-enolase (Marcus et al., Electrophoresis 21: 2622–2636 (2000)). However, in the latter attempt the method failed, and the authors expressly concluded that the low binding affinity between the antibody and the phosphopeptides makes the detection of phosphorylation sites almost impossible (Id. at p. 2635). The prevailing view (enunciated by Marcus et al.) that phosphospecific antibodies are not generally suitable for isolating phosphopeptides has recently been reiterated in a review on protein phosphorylation analysis authored by recognized leaders in the field of biological mass spectrometry. Mann et al., Trends in Biotech. 20: 261–268 (2002).
The identification of Ty1 Gag protein epitopes in digested yeast cell extract using an immobilized epitope-specific antibody has also been reported. See Yu et al., J. Am. Soc. Mass. Spec. 9: 208–215 (1998). However, the immobilized antibody was a Ty1 Gag epitope-specific antibody (i.e. was not a general modification-specific antibody), was not phospho-specific, and recognized only peptides from a single protein, Ty1 Gag. None of these methodologies are suitable for the selective isolation of phosphopeptides from complex mixtures of peptides that are derived from multiple, unpurified proteins, and most require the timely pre-purification of desired proteins. Reviewed in Mann et al., Ann. Rev. Biochem. 70: 437–73 (2001).
Another widely used method for purifying modified peptides is immobilized metal affinity chromatography (IMAC). This pseudo-affinity purification method is based on the interaction of metal ions and negatively charged peptide moieties, such as phosphate. See, e.g. Posewitz et al., Anal. Chem. 15: 2883–2892 (1999). Pre-purified, phosphorylated proteins are digested to peptides, and the phosphorylated peptides are then purified by passing the digest through a miniaturized chromatography column containing a resin with a covalently attached metal chelator, e.g. iminodiacetic or nitrilotriacetic acid. A cation is non-covalently attached to the chelator by treating the resin with one of several metal salts, such as Fe3+, Ni2+, Ga3+, or Cu2+. When the protein digest is applied to the column, peptides with a sufficiently high negative charge density, such as from a phosphate group, can bind to the metal cation. Eluted peptides can then be analyzed by chemical sequencing or by mass spectrometry (MS or MS/MS) to assign phosphorylation sites.
As with the reversed-phase HPLC method, IMAC purification of modified peptides has several limitations that render it unsuitable for the purification of modified peptides from complex mixtures of peptides, such as cellular digests. The method must be adjusted for each desired sample, since, phosphopeptides, for example, are sensitive to the exact conditions used for IMAC. It is not unusual to test peptide binding to all 4 commonly used cations in combination with 3 different pH conditions (12 test conditions altogether) in order to find the metal-pH combination best suited for purification of a single, specific phosphopeptide. Isolating a second, different phosphopeptide from the same, or different, protein may require a second metal-pH combination that is unique. The IMAC method is not specific for phosphopeptides, and peptides with several negatively charged amino acid residues (such as aspartic acid and glutamic acid) and without phosphate can bind to IMAC resins and contaminate any purified phosphopeptides. This drawback is especially problematic when only a small percentage of the protein sample is modified, e.g. a partially phosphorylated protein, because the background level of contaminating nonphosphorylated peptides can overwhelm the level of phosphopeptides. For this reason, the IMAC method is not suitable for the isolation of desired modified peptides from complex peptide mixtures. Further, the method is not specific for the type of modified residue, e.g. phosphorylated residue, thus peptides with phosphoserine, phosphothreonine, or phosphotyrosine all bind and elute from IMAC resins.
Accordingly, there remains a need in the art for the development of simple peptide isolation/purification methods that are suitable for the isolation of modified peptides from complex mixtures of peptides, e.g. digested cell extracts, which contain a wide variety of different, modified proteins, and yet do not require timely or costly pre-purification steps. The development of suitable peptide isolation methods that are simple and can be readily automated would, for example, enable the rapid profiling of activation states on a genome-wide basis and the identification of new diagnostic or therapeutic targets within cell signaling pathways that are at the forefront of the proteomics era currently underway. The unresolved need for such high-throughput methods has recently been recognized. See, e.g. Mann, Nat. Biotech. 17: 954–55 (1999).