Characterisation of the complement of expressed proteins from a single genome is a central focus of the evolving field of proteomics. A proteome is the protein complement of a cell or tissue. Since one genome produces many proteomes (multi-cellular organisms can have hundreds of proteomes) and the number of expressed genes in a cell is generally considered to exceed 10,000, the characterisation of thousands of proteins to evaluate proteomes can best be accomplished using a high-throughput, automated process.
Certain methods for analyzing peptides using mass spectrometry are known in the art. Peptide molecular weights and the masses of sequencing ions can be obtained routinely to an accuracy which enables mass distinction amongst most of the 20 amino acids in the genetic code. In tandem mass spectrometry, a peptide sample is introduced into the mass spectrometer and is subjected to analysis in two mass analyzers (denoted as MS1 and MS2). In MS1, a narrow mass-to-charge window (typically 2-4 Da), centered around the m/z ratio of the peptide to be analyzed, is selected. The ions within the selected mass window are then subjected to fragmentation via collision-induced dissociation, which typically occurs in a collision cell by applying a voltage to the cell and introducing a gas to promote fragmentation. The process produces smaller peptide fragments derived from the precursor ion (termed the ‘product’ or ‘daughter’ ions). The product ions, in addition to any remaining intact precursor ions, are then passed through to a second mass spectrometer (MS2) and detected to produce a fragmentation or tandem (MS/MS) spectrum. The MS/MS spectrum records the m/z values and the instrument-dependent detector response for all ions exiting from the collision cell. Fragmentation across the chemical bonds of the peptide backbone produces ions that are either charged on the C-terminal fragment (designated as x, y or z ions) or on the N-terminal fragment (a, b or c ions). Peptides are fragmented using two general approaches, high and low energy collision-induced dissociation (CID) conditions. In low energy CID experiments, signals assigned to y and b ions and from losses of water and ammonia are usually the most intense. During high energy CID, peptide molecules with sufficient internal energy to cause cleavages of the amino acid side chains are produced. These side chain losses predominantly occur at the amino acid residue where the backbone cleavage occurs. The general designations for these ions are d for N-terminal and w for C-terminal charged fragments, respectively. Other useful sequencing ions occur which result from a y-type cleavage at one residue and a b type cleavage at another residue along the polypeptide backbone (internal fragment ions) (Biemann, K. (1990) Sequencing of peptides by tandem mass spectrometry and high-energy collision-induced dissociation. Methods Enzymol. 193, 455-479; Biemann, K. (1990) Sequencing of peptides by tandem mass spectrometry and high-energy collision-induced dissociation. Methods Enzymol. 193, 455-479; Biemann, K. (1990) Sequencing of peptides by tandem mass spectrometry and high-energy collision-induced dissociation. Methods Enzymol. 193, 455-479; Papayannopoulos, I. A. (1995) Mass Spectrometry Reviews 14, 49-73)
Previous studies have attempted to determine chemical structures of unknown peptides using fragmentation spectra. Most often these studies have involved manual interpretation using prior knowledge derived from fragmentation spectra of known peptides. It is well recognized from these studies that multiple sequence interpretations are possible from the same fragmentation spectrum. The lack of a unique result is a major impediment to the development of accurate, high throughput methods for sequencing unknown peptides using tandem mass spectrometry.
Various computer-mediated methods have been attempted for deducing the sequence of a peptide from an MS/MS spectrum. In one approach, ‘sub-sequencing’ strategies are used whereby portions of the total sequence, (i.e., sub-sequences) are tested against the mass spectrum. (Ishikawa et al. (1986) Biomed. Environ. Mass Spectrom. 13, 373-380; Siegel et al. (1988) Biomed. Environ. Mass Spectrom. 15, 333-343; Johnson et al. (1989) Biomed. Environ. Mass Spectrom. 18, 945-957), which are hereby incorporated by reference in their entirety). In this approach, sub-sequences that read or correlate to ions observed in the MS/MS spectrum are extended by a residue and the whole process is then repeated until the entire sequence is obtained. During each incremental extension of the sequence, the possibilities are reduced by comparing sub-sequences with the mass spectrum and only permitting continuation of the process for sub-sequences giving the most favorable spectral matches. Determination of amino acid composition has also been utilised to limit sequence possibilities. (Zidarov et al. (1990) Biomed, Environ. Mass Spectrom. 19(1), 13-26, the contents of which is hereby incorporated by reference in its entirety).
An alternative approach has been to develop programs for de novo peptide sequencing from fragmentation spectra based on graph theory. (Fernandez-de-Cossjo, J. et al., (1995) CABIOS 11, 427-434; Hines, A. et al. (1995) J. Am. Soc. Mass Spectrom. 3, 326-336; Knapp, J. Am. Soc. Mass Spectrom. 6, 947-961, which are hereby incorporated by reference in their entirety). The basic method involves mathematically transforming an MS/MS spectrum into a form where fragment ions are converted to a single fragment ion type represented by a vertex on the spectrum graph. (Bartels, (1990) Biomed. Environ. Mass Spectrom. 19, 363-368, the contents of which is hereby incorporated by reference in its entirety). Peptide sequences are then determined by finding the longest series of these transformed ions with mass differences corresponding to the mass of an amino acid.
Yet other methods attempt to match spectral information with sequences in protein and translated nucleotide sequence databases. An algorithm has been described for searching protein and nucleotide databases with mass and sequence information from fragmentation spectra of tryptic peptides (MS-TAG) (Mann and Wilm (1994) Anal Chem. 66, 4390; Clauser, P. Baker and A. L. Burlingame, in Proceedings of the 44th ASMS Conference of Mass Spectrometry and Allied Topics. Portland, Oreg., 1996, pp. 365-366, which are hereby incorporated by reference in their entirety). These prior art algorithms require manual spectral interpretation and also suffer from well-recognized problems of inaccurate sequence determination. (Perkins et al. (1999) Electrophoresis 20, 3551-3567), which is hereby incorporated by reference in its entirety). In an effort to mitigate these shortcomings, Mann and his colleagues have used comparison with the fragmentation spectra of the same peptide after methylation of the carboxyl groups or enzymatic digestion in the presence of 18O water to incorporate 18O into the C-terminal carboxy groups (Shevchenko et al. (1997) J. of Protein Chemistry 16(5):481-90 and Shevchenko, A. (1997) Rapid Commun. In Mass Spectr. 11(9), 1015-1024, which are hereby incorporated by reference in its entirety). A similar approach has been extended to the analysis of intact proteins using laser fragmentation and Fourier-transform mass spectrometry. (Mortz, E. et al. (1996) PNAS 93, 8264, which is hereby incorporated by reference in its entirety).
A different approach has been described for identifying peptide sequences from database interrogation by comparing the experimental fragmentation spectrum with theoretical spectra from a mass-constrained set of database sequences (SEQUEST). (Yates III et al. U.S. Pat. No. 5,538,897; Yates III, P. R. Griffin and L. E. Hood, in Techniques in Protein Chemistry, edited by J. J. Villafranca, Vol. 2, Academic Press, San Diego pp. 477-485 (1991), which are hereby incorporated by reference in their entirety). For each candidate sequence within the database spectrum, a theoretical fragmentation spectrum is formed according to a selected ion model of peptide fragmentation. The predicted theoretically derived mass spectra are compared to each of the experimentally derived fragmentation spectra by a cross-correlation function for scoring spectra.
Prior art methods for automated analysis of fragmentation mass spectra are capable of generating a ranked list of candidate peptide sequences in a sequence database; however, identification of a true match from amongst multiple candidate sequences has heretofore required subjective manual assessment by one skilled in spectral interpretation.