Proteomics has emerged as a buzzword complement to genomics: it involves the qualitative and quantitative analysis of gene activity by assessment of protein, rather than RNA, level and/or activity. Proteomics includes the study of events such as post-translational modification of proteins, interactions between proteins, protein function and the location of proteins within the cell. Essentially, proteomics involves the study of part or all of the status of the total protein complement contained within or secreted by a cell, and thus offers a direct and promising look at the biological functions of a cell. In its simplest form, proteomics is an exercise in “mining” biological samples to identify which proteins are present in individual ones. The power of applied proteomics in drug discovery, however, lies in its ability to reveal key differences between the proteomes of, for example, normal and diseased cells. In principle, applied proteomics can reveal unique proteins or protein expression/activity patterns in diseased cells versus normal cells, and thereby can serve the task of molecular diagnosis of a particular disease or disorder. This goal could not be achieved, however, without massively parallel protein identification and characterization techniques.
Current technologies for the analysis of proteomes are based on a variety of protein separation techniques followed by identification of the separated proteins. The most popular method is based on 2D-gel electrophoresis (2DE); see for example Parekh et al., U.S. Pat. Nos. 6,064,754 and 6,278,794. This technique allows the separation of proteins on an acrylamide gel according to their pI and molecular weight. Several hundred proteins can typically be visualized by radioactive or fluorescent labeling or silver staining. However, because the number of proteins in a sample can easily exceed 10,000 and because the number of resolved polypeptides shown in published 2DE databases typically ranges from about 1,000 to 3,000 per gel (See for example Julio Celis Database; http://biobase.dk/cgi-bin/celis), it soon became apparent that only the most abundant proteins in a crude protein mixture could be visualized by gel electrophoresis, highlighting the need for reducing proteomic sample complexity and improving proteomic detection methods.
The need for more sensitive, more accurate and higher-throughput technologies for performing analysis on proteomic material obtained from a variety of biological sources has lead to increasingly refined technologies for the identification of separated proteins. A significant breakthrough has been the mass spectrometric identification of gel-separated proteins: individual proteins (spots) may be excised from the gel for MS analysis. Identification strategies include peptide mapping, in which the masses of peptides produced by site-specific proteolysis are analyzed by mass spectrometry (MS) and correlated with unique mass patterns in protein databases. For example, a proteolytic enzyme such as trypsin (which cleaves polypeptides at arginine and lysine residues) can be used to fragment the extracted protein into two or more peptides. These peptides can then be analyzed by matrix assisted laser desorption ionization (MALDI)- or electrospray ionization (ESI)-mass spectrometry to determine their masses. The determined masses can then be used to screen a database to determine the amino acid sequences of the peptides.
In an alternative technique, direct analysis of highly complex peptide mixtures generated by the digestion of unseparated protein mixtures by liquid chromatography (LC)-MS/MS has provided an alternative to two-dimensional electrophoresis, thereby obviating some of its limitations (e.g., poor detection capabilities of low abundance proteins and limited resolution in the gel separation). For example, peptide amino acid sequence data is obtained by tandem mass spectrometry (MS/MS), and used to screen databases for unique protein sequences (see for example, Eng et al., J. Am. Soc. Mass Spectrom. (1994) 5: 976; Yates III et al., Anal. Chem. (1995) 67: 3202; Yates III et al., Anal. Chem. (1995) 67: 1426; Figeys et al., Anal. Chem. (1996) 68: 1822). In this technique, selected peptide masses are isolated in the first stage of the spectrometer and subjected to collision-induced chemical dissociation, and the masses of the subfragments are then analyzed in the second stage to deduce the amino acid sequence. However this technique alone does not allow quantitative comparison between two similar proteomes (e.g., proteome of a normal cell versus a diseased cell, for example). Furthermore, a prominent problem inherent to proteomic analysis is that of sample complexity. As mentioned previously, the number of proteins in a given sample can easily exceed 10,000. After enzymatic digestion, the number of peptides present in a proteomic sample can reach the hundreds of thousands range. This level of complexity imposes an enormous burden on the analytical process and requires complex analytical techniques in combination with sophisticated computer-assisted technology to perform an otherwise time-consuming analysis.
Methods of simplifying the analysis of complex peptide mixtures by isolating signature peptides containing specific residues have been been proposed for proteomic analysis. These include the derivatization of cysteines in protein mixtures with thiol-specific biotin reagents and isolation of the biotinylated peptides from tryptic digests by binding to avidin (See Gygi et al., “Quantitative analysis of complex protein mixtures using isotope-coded affinity tags”, Nature Biotechnology, 17(10): 994-999, 1999). Peptides containing histidine or glycosyl groups have also been isolated using immobilized metal affinity sorbents or lectin columns, respectively (Ji et al., “Strategy for qualitative and quantitative analysis in proteomics based on signature peptides”, J Chromatogr. B. Biomed. Sci. Appl. 745(1): 197-210, 2000). These methods were used with isotopic labeling and MS analysis to identify and quantitate specific proteins in complex mixtures. Database searching in these cases is limited to those peptides containing the target amino acid or modification. Moreover, these approaches are not necessarily comprehensive as proteins that lack the target moiety are not represented in the isolated peptide mixture.
There remains a need for improved methods for efficiently and reliably identifying and quantifying proteins found in a proteomic sample, and preferably also reducing sample complexity.