This invention relates to a method of processing mass spectrometry data, particularly but not exclusively data obtained from Fourier Transform Ion Cyclotron Resonance Mass Spectrometry (FTMS).
Spectrometry in general, and mass spectrometry in particular, produces extremely rich data sets. This is especially true for high-resolution mass spectrometry data such as those obtained using double focussing magnetic sector mass spectrometry, time-of-flight mass spectrometry and Fourier transform mass spectrometry (FTMS). For example, a standard acquisition from m/z 200-2000 in FTMS involves the measurement of one million data points. Measuring one scan per second (typical for liquid chromatography/mass spectrometry (LC/MS) applications) results in the generation of raw data at a rate of 7.2 GB/hour (approximately 170 GB/day).
Typically, these spectra are stored in a computer memory or an alternative computer readable medium and a large amount of memory is thus required for storage. The bulk of such spectrometry data (perhaps 99%) does not contain valuable information but instead mostly comprises noise which is of no analytical value besides its overall amplitude and standard deviation.
Currently, mass spectrometers will either store the entire data set or may try to reduce the size of the data set in one of two ways.
The first is merely to store a list of peaks found in a mass spectrum (i.e. to store the position and magnitude of each peak). This method has the disadvantage that it is impossible for a user or software to re-evaluate data for further characteristics such as peak shape, background, signal-to-noise ratio or other information that cannot be generated without additional assumptions. Information about the non-peak part of a spectrum is very valuable when information is processed further either manually or automatically. The signal-to-noise ratio gives important hints about the significance of an event. In addition, groups of peaks are very helpful to the skilled user who can evaluate spectra with far greater skill than mere automatic processing of the location and intensity of peaks within a group.
A second method of reducing the size of data file to be stored is achieved by an operator pre-selecting a threshold value and software storing only data points of a spectrum whose value is greater than this threshold. If the operator guesses the threshold value correctly, only data points belonging to peaks will be stored. This has the advantage of preserving information about peak shape. However, this method has the disadvantage that it relies upon the skill of an operator to set the threshold level correctly. If the threshold level is set too low, typically a large quantity of noise data points will be stored along with peak data points and, if the threshold is set too high, valuable information relating to peak shape will be lost as data points of the base of peaks will be missed. Accordingly, such software is difficult for anyone other than an experienced operator to use successfully. In addition, no information relating to noise is stored such that all such information is lost.
An improvement to the analysis of noise in FTMS data is described by Hanna in “Advances in Mass Spectrometry 1985: proceedings of the 10th International Mass Spectrometry Conference”, Swansea, 9-13 Sep. 1985, John Wiley and Sons, and separately in the Proceedings of the ASMS 33rd Annual Conference on Mass Spectrometry and Allied Topics, May 26-31, 1985, San Diego, Calif., USA. The method that Hanna describes uses a statistical analysis of the noise present in an FTMS mass spectrum to obtain a threshold value that is used as a noise exclusion level for the spectrum. Peak lists are obtained from data above this threshold. Whilst the techniques described in the Hanna articles allow a better estimate of a suitable noise threshold to be achieved, they do still suffer from several drawbacks. Firstly, the techniques only result in the determination of peak locations and their intensity, and the valuable information regarding both the peak shape and the spectral noise are lost. Secondly, the techniques are computationally relatively expensive, since, to obtain the parameters of the noise distribution, several iterations are necessary until these parameters stabilise.