1. Field of the Invention
The present invention relates to mass spectrometry systems. More particularly, it relates to mass spectrometry systems that are useful for the analysis of complex mixtures of molecules, including large organic molecules such as proteins or peptides, environmental pollutants, pharmaceuticals and their metabolites, and petrochemical compounds, to methods of analysis used therein, and to a computer program product having computer code embodied therein for causing a computer, or a computer and a mass spectrometer in combination, to affect such analysis.
2. Prior Art
Liquid chromatography interfaced with tandem mass spectrometry (LC/MS/MS) has become a method of choice for protein sequencing (Yates Jr. et al., Anal. Chem. 67, 1426-1436 (1995)). This method involves a few processes including digestion of proteins, LC separation of peptide mixtures generated from the protein digests, MS/MS analysis of the resulting peptides, and database search for protein identification. The key to effectively identify proteins with LC/MS/MS is to produce as many high quality MS/MS spectra as possible to allow for reliable matching during database search. This is achieved by a data-dependent scanning technique in a quadrupole or an ion trap instrument. With this technique, the mass spectrometer checks the intensities and signal to noise ratios of the most abundant ion(s) in a full scan MS spectrum and perform MS/MS experiments when the intensities and signal to noise ratios of the most abundant ions exceed a preset or predetermined threshold. Usually the three most abundant ions are selected for the product ion scans to maximize the sequence information and minimize the time required, as the selection of more than three ions for MS/MS experiments would possibly result in missing other qualified peptides currently eluting from the LC to the mass spectrometer.
The success of LC/MS/MS for identification of proteins is largely due to its many outstanding analytical characteristics. Firstly, it is a quite robust technique with excellent reproducibility. It has been demonstrated that it is reliable for high throughput LC/MS/MS analysis for protein identification. Secondly, when using nanospray ionization, the technique delivers quality MS/MS spectra of peptides at sub-femtomole levels. Thirdly, the MS/MS spectra carry sequence information of both C-terminal and N-terminal ions. This valuable information can be used not only for identification of proteins, but also for pinpointing what post translational modifications (PTM) have occurred to the protein and at which amino acid reside the PTM take place.
Matrix-Assisted Laser Desorption Ionization (MALDI) utilizes a focused laser beam to irradiate the target sample that is co-crystalized with a matrix compound on a conductive sample plate. The ionized molecules are usually detected by a time of flight (TOF) mass spectrometer, due to their shared characteristics as pulsed techniques.
MALDI/TOF is commonly used to detect 2DE separated intact proteins because of its excellent speed, high sensitivity, wide mass range, high resolution, and contaminant-forgivingness. MALDI/TOF with capabilities of delay extraction and reflecting ion optics can achieve impressive mass accuracy at 1-10 ppm and mass resolution with m/Δm at 10000-15000 for the accurate analysis of peptides. However, the lack of MS/MS capability in MALDI/TOF is one of the major limitations for its use in proteomics applications. Post Source Decay (PSD) in MALDI/TOF does generate sequence-like MS/MS information for peptides, but the operation of PSD often is not as robust as that of a triple quadrupole or an ion trap mass spectrometer. Furthermore, PSD data acquisition and analysis is at times difficult to automate as the fragmentation can be peptide or even sequence dependent.
A newly developed MALDI TOF/TOF system (T. Rejtar et al., J. Proteomr. Res. 1(2) 171-179 (2002)) delivers many attractive features. The system consists of two TOFs and a collision cell, which is similar to the configuration of a tandem quadrupole system. The first TOF is used to select precursor ions that undergo collisional induced dissociation (CID) in the cell to generate fragment ions. Subsequently, the fragment ions are detected by the second TOF. One of the attractive features is that TOF/TOF is able to perform as many data dependent MS/MS experiments as necessary, while a typical LC/MS/MS system selects only a few abundant ions for the experiments. This unique development makes it possible for TOF/TOF to perform industry scale proteomic analysis. The proposed solution is to collect fractions from 2D LC experiments and spot the fractions onto an MALDI plate for MS/MS. As a result, more MS/MS spectra can be acquired for more reliable protein identification by database search as the quality of MS/MS spectra generated by high-energy CID in TOF/TOF is far better than PSD spectra.
It is well recognized that Fourier-Transform Ion-Cyclotron Resonance MS (FTICR-MS or more generally FTMS) is a powerful technique that can deliver high sensitivity, high mass resolution, wide mass range, and high mass accuracy. Recently, FTICR-MS coupled with LC showed impressive capabilities for proteomic analysis through Accurate Mass Tags (AMT) (Smith, R. D et al.; Proteomics. 2, 513-523, (2002)). AMT is such an accurate m/z value of a peptide that can be used to exclusively identify a protein. It has been demonstrated that, using the AMT approach, a single LC/FTICR-MS analysis can potentially identify more than 105 proteins with mass accuracy of better than 1 ppm. Nonetheless, ATM alone may not be sufficient to pinpoint amino acid residue specific post-translational modifications of peptides. In addition, the instrument is prohibitively expensive at a typical cost of $650,000 or more with high maintenance requirements.
Thus, the past 100 years have witnessed tremendous strides made on the MS instrumentation with many different types of instruments designed and built for high throughput, high resolution, and high sensitivity work. The instrumentation has been developed to a stage where single ion detection can be routinely accomplished on most commercial MS systems with unit mass resolution allowing for the observation of ion fragments coming from different isotopes. In stark contrast to the sophistication in hardware, very little has been done to systematically and effectively analyze the massive amount of MS data generated by modern MS instrumentation.
In a typical mass spectrometer, the user is usually supplied with a standard material having several known ions covering the mass spectral m/z range of interest. Subject to baseline effects, isotope interferences, mass resolution, and resolution dependence on m/z, peak positions of these standard ions are determined either in terms of centroids or peak maxima through a low order polynomial fit at the peak top. These peak positions are then fit to the known peak positions through either 1st or other higher order polynomial fit to calibrate the mass (m/z) axis.
After the mass axis calibration, a typical mass spectral data trace would then be subjected to peak analysis where peaks (ions) are identified. This peak detection routine is a highly empirical and compounded process where peak shoulders, noise in data trace, baselines due to chemical backgrounds or contamination, isotope peak interferences, etc., are considered.
For the peaks identified, a process called centroiding is typically applied to attempt to calculate the integrated peak areas and peak positions. Due to the many interfering factors outlined above and the intrinsic difficulties in determining peak areas in the presence of other peaks and/or baselines, this is a process plagued by many adjustable parameters that can make an isotope peak appear or disappear with no objective measures of the centroiding quality.
Thus, the current approaches have several pronounced disadvantages. These include:
Lack of Mass Accuracy. The mass calibration currently in use usually does not provide better than 0.1 amu (m/z unit) in mass determination accuracy on a conventional MS system with unit mass resolution (ability to visualize the presence or absence of a significant isotope peak). In order to achieve higher mass accuracy and reduce ambiguity in molecular fingerprinting such as peptide mapping for protein identification, one has to switch to an MS system with higher resolution such as quadrupole TOF (qTOF) or FTMS which come at significantly higher cost.
Large Peak Integration Error. Due to the contribution of mass spectral peak shape, its variability, the isotope peaks, the baseline and other background signals, and random noise, current peak area integration has large errors (both systematic and random errors) for either strong or weak mass spectral peaks.
Difficulties with Isotope Peaks. The current approach does not provide a good way to separate the contributions from various isotopes which usually have partially overlapped mass spectral peaks on conventional MS systems with unit mass resolution. The empirical approaches used either ignore the contributions from neighboring isotope peaks or over-estimate them, resulting in errors for dominating isotope peaks and large biases for weak isotope peaks or even complete ignorance of the weaker peaks. When ions of multiple charges are concerned, the situation becomes even worse, due to the now reduced separation in mass unit between neighboring isotope peaks.
Nonlinear Operation. The current approaches use a multi-stage disjointed process with many empirically adjustable parameters during each stage. Systematic errors (biases) are generated at each stage and propagated down to the later stages in an uncontrolled, unpredictable, and nonlinear manner, making it impossible for the algorithms to report meaningful statistics as measures of data processing quality and reliability.
Dominating Systematic Errors. In most of MS applications, ranging from industrial process control and environmental monitoring to protein identification or biomarker discovery, instrument sensitivity or detection limit has always been a focus and great efforts have been made in many instrument systems to minimize measurement error or noise contribution in the signal. Unfortunately, the peak processing approaches currently in use create a source of systematic error even larger than the random noise in the raw data, thus becoming the limiting factor in instrument sensitivity or reliability.
Mathematical and Statistical Inconsistency. The many empirical approaches used currently make the entire mass spectral peak processing inconsistent, either mathematically or statistically. The peak processing results can change dramatically on slightly different data without any random noise or on the same synthetic data with slightly different noise. In order words, the results of peak processing are not robust and can be unstable depending on the particular experiment or data collection.
Instrument-To-Instrument Variations. It has usually been difficult to directly compare raw mass spectral data from different MS instruments due to variations in the mechanical, electromagnetic, or environmental tolerances. The current ad hoc peak processing applied on the raw data, only adds to the difficulty of quantitatively comparing results from different MS instruments. On the other hand, there is an increasing need for comparing either raw mass spectral data directly or peak processing results from different instruments or different types of instruments, for the purpose of impurity detection or protein identification through searches in established MS libraries.
In nearly all applications of mass spectrometry, it is the form of centroid mass spectral data that will be compared with known mass spectral centroid data, acquired separately, from a known database, or from theoretical isotope calculations, for the purpose of ion or ion fragment identification. When one form of acquired centroid data is compared with another form acquired earlier or on a different instrument, the above mentioned errors associated with mass determination and peak area integration (centroiding) appear twice (once for each instrument) before the actual comparison. Even when the acquired centroid data are compared to theoretically calculated accurate centroids, the actual comparison will have to be performed with a large enough tolerance (e.g., mass binning and/or de-isotoping within a nominal mass window) to reflect the large centroiding errors, especially on a lower resolution instrument such as a unit mass resolution system. The larger tolerance will undoubtedly degrade the quality of comparison/search (confidence level) and significantly slow down the computation due to the many more hits that must be evaluated (computational performance).
In many applications of mass spectrometry, such as with the use of MS/MS, electron impact (EI) ionization, electro-spray ionization (ESI), and post source decay (PSD), an ion in the sample can typically be observed at multiple m/z (or mass) positions due to the creation of many fragment ions or the same ion with different charge states, or both. Even with the poorly processed centroid data mentioned above, the added information from multiple fragments can typically reduce the number of hits during a search while increasing the search confidence. This has made possible some important applications of mass spectrometry:                Compound identification based on actual GC/MS data and EI fragmentation database, e.g., a widely used library available from the National Institute of Standards and Technology (NIST) as described by S. E. Stein, J. Am. Soc. Mass Spectrom. 1999, 10, 770.        Native protein identification through multiple charge deconvolution using ESI as disclosed in the U.S. Pat. Nos. 5,300,771 and 6,118,120.        Protein or peptide database search with MS/MS data using, for example, Sequest algorithm disclosed in the U.S. Pat. No. 5,538,897.        de novo protein or peptide sequencing with MS/MS data to determine the amino acid sequences of a protein or peptide without requiring a protein or peptide database, for example, as described by A. L. Yergey, in J. Am. Soc. Mass. Spectrom. 2002, 13, 784.        
Unfortunately, while adding much needed identification information, the various fragment ions observed typically have vastly varying abundances, and some fragments may not even be observable. The varying abundances of fragment ions pose some unique challenges to the above mentioned and currently widely used “centroiding first and searching or comparison second” approach. The centroiding typically has large peak integration errors associated with it, an issue further compounded by the experimentally varying abundances. This typically leads to algorithms that ignore the peak area or signal intensities through some form of normalization, for example, as disclosed in the U.S. Pat. No. 5,538,897. While normalization provides an easy solution computationally, it inevitably results in the loss of valuable information regarding the likelihood of a particular ion fragment under consideration. Given that ion counting noise is the typical dominating source of noise in ion or fragment detection, a higher intensity or signal level directly translates to a higher probability for the presence of the particular ion fragment. To make the matter worse, all intensity normalization schemes destroy the intrinsic statistical relationship between the ion and its multiple fragments, making it difficult (if not impossible) to statistically assess the presence or absence of an ion under consideration. As a result, heuristic assessment is used through the “training” of the search algorithm on hundreds or thousands of “typical” mass spectra, when in fact all statistical measures can be derived directly from the acquired mass spectrum itself.
Thus, there exists a significant gap between what the current mass spectral instrumentation can offer and what is being achieved at the present using existing technologies for mass spectral analysis.