1. Field of the Invention
The present invention generally relates to the field of Mass Spectrometry (MS) and, more particularly, to methods for calibrating MS instruments systems and for processing MS data.
2. Background of the Invention
Mass Spectrometry (MS) is a 100-year old technology that relies on the ionization and fragmentation of molecules, the dispersion of the fragment ions by their masses, and the proper detection of the ion fragments on the appropriate detectors. There are many ways to achieve each of these three key MS processes which give rise to different types of MS instrumentations having distinct characteristics.
Four major types of ionization techniques are commonly used to both break apart a larger molecule into many smaller molecules and at the same time ionize them so that they can be properly charged before mass dispersion. These ionization schemes include Electrospray Ionization (ESI), Electron Impact Ionization (EI) through the impact of high-energy electrons, Chemical Ionization (CI) through the use of other reactive compounds, and Matrix-Assisted Laser Desorption and Ionization (MALDI). Both ESI and MALDI also serve as means for sample introduction.
Once the molecules in a sample get fragmented and charged through ionization, each fragment will have a corresponding mass-to-charge (m/z) ratio, which will become the basis to mass dispersion. Based on the physical principles used, there are many different ways to achieve mass dispersion, resulting in mass spectral data similar in nature but different in details. A few of the commonly seen configurations include: magnetic sectors; quadrupoles; Time-Of-Flight (TOF); and Fourier Transform Ion-Cyclotron Resonance (FT ICR).
The magnetic sectors configuration is the most straight-forward mass dispersion technique where ions with different m/z ratios would separate in a magnetic field and exit this field at spatially separated locations where they will be detected with either a fixed array of detector elements or a movable set of small detectors that can be adjusted to detect different ions depending on the application. This is a simultaneous configuration where all ions from the sample are separated simultaneously in space rather than sequentially in time.
The quadrupoles configuration is perhaps the most popular MS configuration where ions of different m/z values will be filtered out of a set of (usually 4) parallel rods through the manipulation of RF/DC ratios applied to these rod pairs. Only ions of a certain m/z value will survive the trip through these rods at a given RF/DC ratio, resulting in the sequential separation and detection of fragment ions. Due to its sequential nature, only one detector element is required for detection. Another configuration that uses ion traps can be considered a special example of quadrupole MS.
The Time-Of-Flight (TOF) configuration is another sequential dispersion and detection scheme that lets the fragment ions accelerate under electrical field through a high vacuum flight tube before detection. Ions of different m/z values would arrive at different times to the detector and the arrival time can be related to the m/z values through the use of calibration standard(s).
In Fourier Transform Ion-Cyclotron Resonance (FT ICR), after fragmentation and ionization, all ions can be introduced to an ion cyclotron where ions of different m/z ratios would be trapped and resonate at different frequencies. These ions can be pulsed out through the application of a Radio Frequency (RF) signal and the ion intensities measured as a function of time on a detector. Upon Fourier transformation of the time domain data measured, one gets back the frequency domain data where the frequency can be related back to m/z ratios through the use of calibration standard(s).
Ions can be detected either directly by the use of Faraday cups or indirectly by the use of electron multiplier tubes (EMT)/plates (EMP) or photon multiplier tubes (PMT) after a converter that converts ions into light. FIGS. 1A, 1B, and 1C are diagrams illustrating a typical mass spectral data trace on different ion intensity scales 110, 120, and 130 respectively plotted as a function of m/z ratio, according to the prior art.
The past one hundred years have witnessed tremendous strides made on the MS instrumentation with many different flavors of instruments designed and built for high throughput, high resolution, and high sensitivity work. The instrumentation has been developed to a stage where single ion detection can be routinely accomplished on most commercial MS systems with unit mass resolution allowing for the observation of ion fragments coming from different isotopes. In stark contrast to the sophistication in hardware, very little has been done to systematically and effectively analyze the massive amount of MS data generated by modern MS instrumentation.
On a typical mass spectrometer, the user is usually required or supplied with a standard material having several fragment ions covering the mass spectral m/z range of interest. Subject to baseline effects, isotope interferences, mass resolution, and resolution dependence on mass, peak positions of a few ion fragments are determined either in terms of centroids or peak maxima through a low order polynomial fit at the peak top. These peak positions are then fit to the known peak positions for these ions through either 1st or other higher order polynomials to calibrate the mass (m/z) axis.
After the mass axis calibration, a typical mass spectral data trace would then be subjected to peak analysis where peaks (ions) are identified. This peak detection routine is a highly empirical and compounded process where peak shoulders, noise in data trace, baselines due to chemical backgrounds or contamination, isotope peak interferences, etc., are considered.
For the peaks identified, a process called centroiding is typically applied where an attempt at calculating the integrated peak areas and peak positions would be made. Due to the many interfering factors outlined above and the intrinsic difficulties in determining peak areas in the presence of other peaks and/or baselines, this is a process plagued by many adjustable parameters that can make an isotope peak appear or disappear with no objective measures of the centroiding quality.
A description will now be given of some of the many disadvantages of the conventional approaches to processing mass spectrometry data.
One disadvantage is the lack of mass accuracy. The mass calibration currently in use usually does not provide better than 0.1 amu (m/z unit) in mass determination accuracy on a conventional MS system with unit mass resolution (ability to visualize the presence or absence of a significant isotope peak). In order to achieve higher mass accuracy and reduce ambiguity in molecular fingerprinting such as peptide mapping for protein identification, one has to switch to an MS system with higher resolution such as quadrupole TOF (qTOF) or FT ICR MS which comes at a significantly higher cost.
Another disadvantage is the large peak integration error. Due to the contribution of mass spectral peak shape, its variability, the isotope peaks, the baseline and other background signals, and the random noise, current peak area integration has large errors (both systematic and random errors) for either strong or weak mass spectral peaks.
Yet another disadvantage includes difficulties with isotope peaks. Current approaches do not have a good way to separate the contributions from various isotopes which usually give out partially overlapped mass spectral peaks on conventional MS systems with unit mass resolution. The empirical approaches used either ignore the contributions from neighboring isotope peaks or over-estimate them, resulting in errors for dominating isotope peaks and large biases for weak isotope peaks or even complete ignorance of the weaker peaks. When ions of multiple charges are concerned, the situation becomes even worse, due to the now reduced separation in m/z mass unit between neighboring isotope peaks.
Yet still another disadvantage is nonlinear operation. The current approaches use a multi-stage disjointed process with many empirically adjustable parameters during each stage. Systematic errors (biases) are generated at each stage and propagated down to the later stages in an uncontrolled, unpredictable, and nonlinear manner, making it impossible for the algorithms to report meaningful statistics as measures of data processing quality and reliability.
A further disadvantage is the dominating systematic errors. In most of MS applications, ranging from industrial process control and environmental monitoring to protein identification or biomarker discovery, instrument sensitivity or detection limit has always been a focus and great efforts have been made in many instrument systems to minimize measurement error or noise contribution in the signal. Unfortunately, the peak processing approaches currently in use create a source of systematic error even larger than the random noise in the raw data, thus becoming the limiting factor in instrument sensitivity.
An additional disadvantage is mathematical and statistical inconsistency. The many empirical approaches currently used make the whole mass spectral peak processing inconsistent either mathematically or statistically. The peak processing results can change dramatically on slightly different data without any random noise or on the same synthetic data with slightly different noise. In order words, the results of the peak processing are not robust and can be unstable depending on the particular experiment or data collection.
Moreover, another disadvantage is the instrument-to-instrument variations. It has usually been difficult to directly compare raw mass spectral data from different MS instruments due to variations in the mechanical, electromagnetic, or environmental tolerances. With the current ad hoc peak processing applied on the raw data, it only adds to the difficulty of quantitatively comparing results from different MS instruments. On the other hand, the need for comparing either raw mass spectral data directly or peak processing results from different instruments or different types of instruments has been increasingly heightened for the purpose of impurity detection or protein identification through computer searches in established MS libraries.
Accordingly, it would be desirable and highly advantageous to have methods for calibrating Mass Spectrometry (MS) instruments systems and for processing MS data that overcome the above-described deficiencies and disadvantages of the prior art.