The mass spectrometry of a polypeptide mixture is a technique that is applied in protein studies, and the interpretation of mass spectra makes it possible to identify or quantify proteins in the mixture. Disclosed herein is a method of determining the mass of polypeptides using mass spectra, which is the most fundamental step among the steps of interpreting mass spectra, and will become the basis of advanced spectral interpretation in the future.
Mass spectral data are stored as a list of peaks. Each peak is defined as the mass-to-charge ratio (m/z) and the intensity of polypeptides in a mixture. The mixture of polypeptides becomes positively charged ions combined with protons H+ through a mass spectrometer, and the polypeptide ions are detected as the mass-to-charge ratio (m/z) and intensity thereof instead of the direct mass in the mass spectra.
In initial spectral data obtained using the mass spectrometer, the mass-to-charge ratio of peaks can be determined either through continuous waveform data, which cannot define the definite locations of peaks, that is, mass-to-charge rations, or through a suitable peak-picking procedure. The method provided herein is based on mass spectral data subjected to such a peak-picking procedure.
The mass of a polypeptide is defined, for example, as the sum of masses of carbon (C), hydrogen (H), nitrogen (N), oxygen (O) and sulfur (S) atoms in the relevant peptide, and uses monoisotopic mass as a representative value. As used herein, the term “monoisotopic mass” refers to the sum of masses of atoms, on the assumption that the atoms of a polypeptide are all present in the lightest isotopic forms thereof. All elements present in nature have isotopes. For example, for the carbon atom, 12C and 13C isotopes exist, and 13C is present at a rate of 1%. Thus, for a given polypeptide, if any atoms correspond to heavy isotopes, several peaks having different mass values can be detected in the spectrum. For this reason, monoisotopic mass is used as a value representative of a polypeptide. However, it is a difficult problem to find a peak corresponding to monoisotopic mass directly in an actual spectrum, because complicated and overlapping peaks can appear in the spectrum due to the difference in isotopic mass between several polypeptides, and the larger the mass of a polypeptide, the lower the likelihood that the atoms of the polypeptide will all consist of the lightest isotopes.
If polypeptides having the same elementary composition show different peaks in the spectra due only to the difference in monoisotopic mass therebetween, the group of such peaks is defined as an isotopic cluster. The peaks of this isotopic cluster continuously appear with a mass difference of 1 Da (Dalton), and because of the charges (z) of polypeptide ions, the peak interval of mass-to-charge ratio (m/z) in the actual spectrum is a constant interval of 1/z. If mass spectral data are interpreted to find an isotopic cluster consisting of the same polypeptide ions, the charge and monoisotopic mass of the isotopic cluster can be determined.
Prior typical programs for finding this isotopic cluster and determining the monoisotopic mass of the cluster include ICR2LS. ICR2LS employs the following method, known as the THRASH algorithm. This method comprises selecting peaks, which can become candidates of the isotopic cluster, from a spectrum, and comparing the selected peaks with the peak shape of the isotopic cluster based on the averagine composition to determine the isotopic cluster.
The concrete procedure of the THRASH algorithm is as follows. First, about 1 m/z is taken in a suitable region in a spectrum, and the peak having the highest intensity in the relevant region is selected. Peaks having a constant distance around the relevant peak are selected to determine a candidate isotopic cluster, and the charge of the candidate isotopic cluster is calculated to obtain an approximate mass close to the monoisotopic mass. From the approximate mass, the peak intensities of the isotopic cluster based on the pre-calculated averagine composition can be obtained. The peak shape of the theoretical isotopic cluster is compared with the peak shape of the candidate isotope cluster to calculate the error in the peak intensities, and if the error is sufficiently small, the candidate isotope cluster is judged to be the isotopic cluster. If the candidate isotope cluster cannot be judged to be the isotopic cluster because the error is great, the charge is changed to determine a candidate isotopic cluster again, and the above-described procedures are repeated.
In the case of THRASH, if the elementary composition of a polypeptide, measured through mass spectrometry, deviates from the averagine composition, a great error in the peak intensity can occur, because the peak shape of the isotopic cluster does not coincide well with the pre-calculated peak shape of the isotopic cluster. The peak location and intensity distribution of the isotopic cluster based on the averagine composition are determined only by mass, but the peak distribution based on the actual isotopes is determined by the number of elements of the polypeptide. Herein, the actual events can differ greatly from the theoretical events due to the incompleteness of procedures for processing ionic signals (e.g., a procedure for digitizing superimposed current as a function of time, and signal amplification/modification procedures) and the non-probabilistic isotopic distribution, resulting from a decrease in actual ion number. In this case, THRASH cannot accurately determine the peak locations of monoisotopes. Another known problem is that the processing speed in a procedure of comparing the peak shapes of the isotopic clusters becomes significantly slow.