Field of the Invention
The present invention relates to techniques for analyzing mass-spectrometry data. More specifically, the present invention relates to the analysis of mass-spectrometry data for peptides.
Related Art
In proteomics, proteins are often identified using mass spectrometry. A protein sample is typically digested into peptides that include one or more amino acids. For example, the protein sample can be digested using the enzyme trypsin. The resulting peptides can be ionized using matrix assisted laser desorption ionization or electro-spray ionization and introduced into a mass spectrometer. Tandem mass spectrometry measures the mass-to-charge ratios of the peptides, and then fragments the peptides and measures the mass-to-charge ratio of the resulting fragments. Peptide identifications made from tandem-mass-spectrometry data can be aggregated to identify the proteins in the sample.
In principle, the peptides in the sample can be uniquely identified using the peaks in the resulting mass-spectrometry spectra (which are associated with the mass-to-charge ratios of the peptides and peptide fragments). For example, peptides may be identified by comparing the observed mass-spectrometry spectra to theoretical mass-spectrometry spectra of peptides predicted by gene sequences or to previously observed mass-spectrometry spectra for known peptides.
In practice, however, it is often difficult to identify the peptides. For example, there may be chemical modifications to the amino acids in the peptides. These chemical modifications may be in vivo post-translational modifications or simply chemical artifacts, such as modifications that occur when the protein sample is prepared for mass-spectrometry analysis. When present, the chemical modifications can lead to shifts in the peaks in the mass-spectrometry spectrum of a peptide, which can complicate or confound the identification of the peptide based on comparisons with the previously observed or theoretically predicted mass-spectrometry spectra for known peptides.
One existing analysis technique attempts to address this problem by shifting some or all of the peaks in the previously observed or theoretically predicted mass-spectrometry spectra, based on one or more chemical modifications that are anticipated (prior to the mass-spectrometry analysis) to occur in the protein sample. The mass-spectrometry spectra with shifted peaks can then be compared with the observed unknown mass-spectrometry spectrum in order to make an identification. Unfortunately, the chemical modifications in a protein sample are difficult to guess a priori. Moreover there are more than 200 types of potential chemical modifications, and ten or more of these types may be present in a single protein sample, so it is often too computationally expensive to search for all combinations of all potential modifications. Consequently, this existing analysis technique may be too restrictive to properly analyze the observed mass-spectrometry spectra.
Another existing analysis technique uses a so-called “blind modification search” to identify the peptide represented in an observed mass-spectrometry spectrum. In this existing analysis technique, peaks in the observed mass-spectrometry spectrum are fit without using any prior knowledge of likely mass shifts, apart from upper and lower bounds on the size of the shift. Blind modification search, however, is often too general because it does not take advantage of chemical knowledge, such as the propensity of methionine to oxidize, or the likelihood of chemical artifacts at the peptide N-terminus.
Hence, what is needed is a method and an apparatus that facilitates analysis of mass-spectrometry data for proteins without the problems listed above.