Mass Spectrometry
Mass spectrometry is a widely used method for characterizing the composition of complex mixtures. The primary goal of mass spectrometry is to identify molecules by mass or the masses of their fragments. A secondary goal is to determine how much of each type of molecule is present in a mixture. The mass of a molecule is determined by first ionizing the intact molecule, placing it in a force field, and observing some property of its trajectory. Both electrostatic and electromagnetic forces depend linearly upon the ion's charge. Thus, its acceleration in such a field depends inversely on the mass-to-charge ratio (m/z).
Mass Spectrometry Performance Metrics
Metrics used to describe the performance of a mass spectrometry platform include mass accuracy, mass resolving power, sensitivity, and quantification accuracy. Mass accuracy is the most important metric because errors in mass may lead to misidentification of components in a sample. The ability to accurately determine the mass of a low-abundance species, whose signal power is not much greater than noise, is especially important in many applications, e.g., proteomic biomarker discovery. Mass resolving power is another metric, also important because the maximum complexity of a mixture that can be successfully analyzed is limited by the ability to distinguish species with very similar m/z values. Sensitivity limits the ability to observe low-abundance species, which is a particularly important issue when components in a given mixture have widely varying abundances. Quantification accuracy is important in many applications when relative abundances need to be determined. These four metrics are commonly used to assess the relative performance of instruments and data analysis methods.
FTMS
Fourier transform ion cyclotron resonance mass spectrometry (FT-ICR MS or FTMS) is a well-known method that offers higher mass resolution, greater mass resolving power, and higher mass accuracy than other known mass analysis methods. The superior performance of FTMS makes it the method of choice for analyzing mixtures of very high complexity such as blood or oil. The principles of FT-ICR MS are described in A. Marshall, C. Hendrickson, G. Jackson, Fourier Transform Ion Cyclotron Resonance Mass Spectrometry: A Primer, Mass Spectrometry Reviews, Volume 17, 1998, pp. 1-35. In FTMS, a magnetic field induces ion cyclotron motion.
A magnetic field will induce an ion whose initial velocity is normal to the field to orbit in a plane normal to the field with a frequency that depends inversely upon the ion's m/z value. Thus, estimates of an ion's orbital frequency can be used to determine its m/z value. If the ion has velocity along the direction of the magnetic field, it would continue to move inertially in this direction. An electrostatic trapping potential that varies quadratically along the direction of the field is applied to confine the ion along this axis.
Orbitrap
A related machine, the LTQ-Orbitrap™, manufactured by Thermo-Fisher Scientific, measures the frequency of oscillation induced by a trapping potential that varies harmonically in one direction; a central electrode, rather than a magnetic field, provides the centripetal force that induces orbital motion in a plane that is normal to the trapping forces. The orbital motion of the ion is used to trap the ion. From a data analysis standpoint, the Orbitrap is a type of FTMS machine, even though it is not always classified as such by mass spectrometrists. The inventive method described herein is equally applicable to Orbitrap data as to data from traditional FTMS instruments. The peak shape for FTMS and Orbitrap signals are both accurately characterized by the same model function. Unless indicated, the two types of peak shapes can be considered interchangeable. The same estimator, e.g. with no modification, can determine ion packet parameters form data collected on either machine. The difference between the FTMS and Orbitrap signals emerges downstream from the inventive estimator in the mass calibration step, as the ion packet frequency has a different dependency on mass-to-charge ratio.
Determining m/z Values from FTMS Signal
Like other types of mass spectrometry, the FTMS signal does not yield a direct measurement of the m/z values of ions. The FTMS signal is a time-dependent voltage signal generated by the difference in the image charge induced by an ion on two parallel conducting detector plates. The voltage varies linearly with the ion's displacement along the line connecting the two plates. In the ideal case of a single ion in a circular orbit (e.g., in the xy-plane), the voltage between two parallel plates (e.g., lying in planes normal to the x-axis) has a sinusoidal time-dependence. To first order, the FTMS signal is a sum of sinusoidal signals, one signal per ion packet, and one ion packet for each distinct m/z value in the mixture. Application of the Fourier transform to a sum of sinusoids produces a frequency spectrum that contains one peak for each sinusoidal component. Because the (complex-valued) Fourier-transform is informationally equivalent to the time-domain signal, it can be referred to as the frequency-domain representation of the signal.
Because the time-domain and frequency-domain representations of the signal are equivalent, estimation can be performed in either domain. However, performing the estimation in the frequency domain is significantly easier. Most of the signal power from an ion packet is concentrated in a narrow band centered at its oscillation frequency. Although signals from various ion packets are completely overlapped in the time domain, signals in the frequency domain are essentially non-overlapped, except in relatively rare cases where two packets have very similar m/z. Nearly all of the information about an ion packet is contained in a relatively small window of frequency samples, allowing rapid computations with high accuracy.
Application of the Fourier transform to separate signals from ions with distinct m/z values into distinct peaks is the distinguishing property of FTMS. The position of each peak in the frequency spectrum (i.e., its frequency) indicates the m/z value of the ion, and the magnitude indicates its relative abundance. Signal processing is necessary to precisely determine the magnitude and frequency of each ion packet signal. The precise position of the peak is obscured by several factors, including the finite duration for which the signal is observed, the decay of the signal amplitude over time, and the electronic noise in the measurements. Accordingly, there is a need in the art to design an estimator to accurately determine values of the desired parameters.
Magnitude-Based Methods
Existing methods for extracting information from FTMS data do not make use of the complex-valued Fourier transform. These methods instead use the magnitude-mode spectra. A complex number, like an observed value of the Fourier-transform, can be characterized by the values of its real and imaginary components, or equivalently, by its magnitude and phase. The magnitude of a complex number is the square-root of the sum of the squares of the real and imaginary components. A magnitude-mode spectrum can be thought of as removing the phases from each Fourier-transform sample. Thus, the magnitude-mode spectrum contains exactly half the information of the complex-valued spectrum.
The magnitude-mode spectrum is phase-invariant, meaning that it is independent of the initial phases of the ion packets, except for effects of signal overlaps, which are not directly modeled in these magnitude-based methods. Although phase-invariant analysis leads to simpler computations, removing the phase dependence destroys valuable information. For example, the phases of the ion packets could be used to compute absorption spectra, whose peaks are roughly half as wide as corresponding peaks in magnitude-mode spectra, resulting in a two-fold gain in mass resolving power.
Zero-padding is a computational trick used to recover the information lost by removing phases. Although phase information can be recovered in theory by zero-padding, removal of the phases ultimately diminishes all aspects of mass spectrometry performance. Zero-padding can be viewed in the time-domain as appending N zeros to the end of N observed samples or equivalently, calculating the samples of the Fourier transform at intervals of 1/(2T) rather than 1/T. That is to say, magnitude values are calculated halfway in between observed transform values. The complex-valued samples halfway in between observed values are not independent; rather, they can be computed as linear combinations of the observed values. However, the set of magnitudes produced by this process are independent. It can be shown that the N Fourier transform magnitudes produced by zero-padding are informationally equivalent to the N/2 complex-values of the unpadded Fourier transform. However, zero-padding has the undesirable property of introducing sidelobes to the tails of the peaks. That is, the magnitude samples no longer decrease monotonically as the distance from the peak centroid increases, but instead bob up and down every other sample.
The wiggling associated with each ion packet signal typically confounds peak detection algorithms by introducing numerous local maxima in the spectrum. Application of an apodization filter can reduce the wiggling artifact. Apodization filters can be designed to eliminate adjacent sidelobes, but they have the undesirable property of broadening the peak. Peak broadening reduces the mass resolving power of the mass spectrometer, as well as the mass accuracy.
Furthermore, calculation of the magnitude-mode spectrum involves the application of non-linear operations upon the Fourier-transform. As a result, the analysis of noise becomes problematic: observed magnitudes are Rayleigh-distributed, while the Fourier-transform values are Gaussian distributed. Analysis of Gaussian-distributed observations is conceptually and computationally much simpler.
An Alternative Model-Based Approach
A model-based approach for analyzing FTMS spectra has been described in the literature (Giancaspro and Comisarow, 1983). In this method, three parameters describing a magnitude-Lorentzian curve are fit (exactly) to the three samples of highest-magnitude in a magnitude-mode spectrum. In the absence of noise, the estimated parameters would give the exact ICR frequency and amplitude of the observed peak. However, the technique is not robust in the presence of noise. In fact, even a relatively small amount of noise can cause critical instability in the estimator. For example, it is possible for the estimated peak height to approach infinity or for there to be no Lorentzian curve that passes through a set of noisy observations.
Giancaspro and Comisarow attempted to model absorption spectra also, recognizing the potential for additional performance gains. The authors observe, however, that the magnitude-Lorentzian peak cannot be used to fit an absorption spectrum. This result is not surprising: the two functions are different, and one would not be expected to fit the other. The differences between the functions decrease as the observation duration increases. However, typical observation durations are such that these differences between the models are substantial. As a result, as the paper points out, parabolic models achieve similar mass accuracy under typical conditions for FTMS data collection.
It is unlikely that any commercially available FTMS data analysis methods make use of the prior art method of Giancaspro and Comisarow or any other model-based method. Possibly, the prevailing view in the field is that estimating frequency by parabolic fit (see below) is as good as, or superior to, model-based approaches, as a result of this misleading paper. Accordingly, there is a need in the art to correct the flaw in the above prior art method by using the theoretical absorption and dispersion spectra, rather than a magnitude Lorentzian to model the real and imaginary components of the observed Fourier transform.
Heuristic or Model-Free Methods
The most prevalent method for determining ion frequencies is to fit a parabola to the three largest values in the zero-padded magnitude-mode spectrum in the region of a detected peak and then taking the frequency coordinate of parabola's vertex to be the frequency estimate (FIG. 5). One can interpret the parabola as an implicit model for the peak shape in this method. For a small enough neighborhood, any maximum can be approximated by a parabola. However, the quality of the approximation is limited by the size of the region (1/T, where T denotes the observation duration). Even in such a small region, the approximation is significantly outperformed by a superior peak-shape model. Outside of this narrow band of frequencies, the parabolic model does not provide an even moderately accurate model of the peak shape. As a result, it is not possible to use these observations in determining the ion frequency.
Because the parabola-based estimate uses three parameters to fit three points, it is highly sensitive to noise in the observations. It is also unable to detect anomalies in the observed peak shapes caused by false detection or overlap between adjacent signals. The magnitude (and thus the relative ion abundance) of the packet are not determined optimally using the parabolic model. The parabolic model cannot be used for abundance estimation, which requires modeling of the peak shape over a larger band of frequency, i.e., outside a small neighborhood around the frequency maximum.
In theory, the ion packet abundance can be estimated from the area under the peak in the absorption spectrum or equivalently in the complex-valued Fourier transform. In practice, this technique suffers from the coarse sampling of the peak, and accurate interpolation is not possible without a peak-shape model. Furthermore, the peak has long tails that are difficult to integrate in the presence of noise and adjacent peaks.
Accordingly, there is a need in the art to design a technique to accurately estimate the parameters that describe ion packet trajectories with very high accuracy. Accurately estimating these parameters leads to accurate identification and quantification in complex mixtures.