Mass spectrometry is an analytical tool that can be used to determine the molecular weights of chemical compounds by generating ions from the chemical compounds, and separating these ions according to their mass-to-charge ration (m/z). The ions are generated by inducing either a loss or a gain of a charge by the chemical compounds, such as via electron ejection, protonation, or deprotonation. The ions are then separated according to their m/z values and detected. The resulting data are often presented as a spectrum, a two-dimensional (2-D) plot with m/z ratio on the x-axis and abundance of ions on the y-axis. Thus, this spectrum shows the distribution of m/z values in the population of ions being analyzed. This distribution is characteristic for a given compound. Therefore, if the sample is a pure compound or contains only a few compounds, mass spectrometry can reveal the identity of the compound(s) in the sample.
A complex sample usually contains too many chemical compounds to be analyzed meaningfully by mass spectrometry alone, because ionization of different chemical compounds may result in ions with the same m/z value. The more chemical compounds a sample contains, the more likely ions of the same m/z values will be generated from different compounds. Therefore, a complex sample is typically resolved to some extent prior to mass spectrometry, such as by liquid chromatography, gas chromatography, or capillary electrophoresis. In this sample separation step, the chemical compounds in the sample are separated based on how long they stay in the sample separation medium. Once a chemical compound goes through the sample separation medium, it enters a mass spectrometer system, and the ionization/ion separation/detection process begins as described above. The resulting data for each ion thus has one more property, retention time, which is the time the chemical compound that gives rise to the ion stays in the sample separation medium. Thus, mass spectral data of a sample that is analyzed by a sample separation method before mass spectrometry can be presented as a three-dimensional (3-D) plot, with retention time, m/z value and ion abundance on the three axes of the plot.
Even with a sample separation method, it is still not an easy task to analyze mass spectral data from a complex sample due to the vast number of peaks. A method has been introduced to deconvolute mass spectral data based on compound properties such as isotopic clusters (see U.S. Patent Application Publication 2007-0176088). In this method, 3-D peaks that share the same retention time are examined, and isotopic clusters of the same compound are grouped together, thereby reducing the complexity of the mass spectral data significantly. This method, however, is most useful for analytes with relatively small molecular weights. Large molecules, such as most intact proteins, are often too large for their isotopomers to be resolved in a mass spectrometer. As a result, an accurate monoisotopic mass cannot be calculated for the given isotopic cluster using the charge state spacing of the isotopomers.
Currently, the most common method for intact protein mass determination is the maximum entropy deconvolution method (Ferrige et al., 1991). This method transforms a mass spectrum in m/z units, usually by averaging all the spectra across an LC or other elution profile for a protein, to a mass spectrum containing the zero-charge representation of intact proteins (in Dalton units) across a user-specified mass range. For simple averaged mass spectra with at most a few intact proteins, this method is quite reliable. However, more complex mass spectra produce false positive “overtone” peaks, which correspond to masses calculated from randomly dispersed peaks from the raw data. This can be somewhat overcome by the user specifying a very wide mass range, but the algorithm would require a significantly longer amount of time to complete. Since maximum entropy deconvolution works on a mass spectrum but most proteins are characterized by LC/MS, a conversion from 3-D data (m/z, retention time, abundance) to 2-D data (m/z, abundance) is critical for optimum performance of the algorithm. For simple data, the selection of the averaged spectrum is quite easy since each eluting protein should show an isolated peak in the LC chromatogram. However, for very complex mixtures, the selection of the optimal range of spectra to average is nearly impossible, since many proteins will be closely eluting or co-eluting. Finally, the abundance values in maximum entropy deconvoluted spectra are not reliable from run to run, making relative quantitation between experiments impossible.
Therefore, it is desirable to have a better method for deconvoluting complex mass spectral data from samples comprising large molecules.