The progress in the genome projects rapidly advances the development of new strategies in functional genomics that aim at elucidating the biological function of the genes by global analyses of protein expression and modification. Proteomics is a key technology in this context which allows the analysis of thousands of proteins in a single 1- or 2-dimensional gel electrophoresis. However, the identification of these proteins is still time-consuming and the rate-limiting step slowing the progress in functional genomics.
Mass spectrometry is capable of rapid, accurate and sensitive analysis of biomolecules and is increasingly used to obtain structural information on proteins and peptides. As yet, the identification of proteins by mass spectrometry is done mostly by protein fragment fingerprinting which involves enzymatic fragmentation of the protein, determination of the masses of the resulting fragments and comparison of their patterns with theoretical fragmentation patterns calculated from database sequences. With the increasing complexity of the sequence databases, however, there is a growing need for sequence information from the proteins. Accurate sequence information will translate into more accurate and faster database searches and is prerequisite for the detection of mutations and the identification of homologues of known proteins from species with yet not-sequenced genomes. In addition, the identification of biologically active peptides such as MHC-bound T cell epitopes, some peptide hormones and peptide antibiotics requires complete de novo sequencing, i.e. deduction of the peptide sequences directly from the mass spectra.
MALDI-TOF mass spectrometry is widely preferred for the detection of peptide masses because MALDI generates monocharged ions. ESI-MS/MS techniques, on the other hand, are favoured for the sequence determination. ESI-MS/MS spectra can be interpreted by several sequencing strategies including database searches with MS/MS data or peptide sequence tags, and new programs for de novo sequencing. The combination of these two MS technologies is currently seen as the best approach to proteome analyses. However, this combination comes with severe disadvantages as it requires two expensive instruments, is, despite attempts towards automation, time-consuming and labor-intensive and requires to split the often precious samples for preparation for the two different ionization techniques which increases the labor involved and reduces the sensitivity of the analyses.
In principle, the sequence of a peptide can be deduced from the MALDI-TOF post-source decay (MALDI-PSD) or collision-induced dissociation (MALDI-CID) fragmentation spectra. However, experimental spectra are very complex, often incomplete in that not all possible fragments are produced and marred by erroneous signals which can not be assigned to a known mode of peptide fragmentation. The mass accuracy of MALDI-PSD or -CID measurements is relatively low. The currently most extensively applied approaches for extracting peptide sequences from mass spectra compare the fragment masses to theoretical fragment masses calculated from database sequences (Perkins et al. Electrophoresis. 1999; 20: 3551; Creasy et al. Proteomics. 2002; 2: 1426; Fieldet al. Proteomics. 2002; 2: 36). The success of these approaches depends on the presence of the corresponding sequence in the databases. New, yet unknown or modified or mutated proteins or peptides can not or only exceptionally be identified.
Disintegration of peptides in the MALDI-TOF mass spectrometer produces daughter ions of different categories, as shown in FIG. 1 (Chaurandet al. J Am Soc Mass Spectrom. 1999; 10: 91). With both, laser-induced dissociation and collision-induced dissociation peptide desintegrate preferentially at main chain bonds. The resulting fragments that contain the C-terminus of the original peptide are called y-series fragments, those with the N-terminus b-series fragments. Both types of fragments may undergo further decompositions to yield additional fragment series. These fragment series extend to different positions in the sequence, ideally, to cover the entire sequence. The mass difference between two adjacent fragments of a series corresponds to the mass of the amino acid in the corresponding sequence position. In addition to the terminal fragments, internal fragments lacking both terminal amino acids as well as immonium ions corresponding to single amino acids are produced. The loss of ammonia is observed for all classes of ions with the exception of the immonium ions. Internal fragments occasionally lose carbonic monoxide. Additional loss of water is observed for fragments containing serine or threonine, and also other amino acids can cause specific secondary fragmentations. Positively charged amino acids at the C-terminus as in tryptic fragments of proteins enhance the y series and facilitate their identification. Positively charged amino acids at or near the N terminus lead to loss of carbonic monoxide in the b but not in the y fragments. Side chain fragmentations are rare but observed in different amino acids including argenine and the aliphatic isobars leucine and isoleucine.
Several strategies were proposed for enhancing the daughter ion series containing one of the terminal amino acids thus to facilitate sequence determination. Among these strategies are comparative analyses based on the exchange of hydrogen and oxygen isotopes at or selective chemical derivatization of the terminal amino acid (Heller et al. J Am Soc Mass Spectrom. 2003; 14: 704; Mo et al. Rapid Commun Mass Spectrom. 1997; 11: 1829; Uttenweiler-Joseph et al. Proteomics. 2001; 1: 668). While these strategies help in special cases they have not been adapted widely because in most cases they are laborious, can produce even more complex spectra and lead to loss of sample material and thereby to reduced sensitivity.