1. Technical Field
The disclosed technology relates to the fields of bioinformatics and computational detection of valid data values from noise data values in a data set.
2. Background Art
There exists many situations where a data set contains a mixture of valid data values and noise data values. It is often difficult to distinguish between the valid data values and the noise data values within the data set. In addition, there are data sets where the computational cost of analyzing the data set is extremely dependent on whether a noise data value is mistakenly identified as a valid data value. Often the computational cost of analyzing the data set is only marginally dependent on whether a valid data value is mistakenly identified as a noise data value. Another issue with identifying a noise data value as a valid data value is that subsequent analysis of the data set is likely to generate a wrong answer, whereas failure to identify a valid peak may create ambiguity, but will not provide incorrect results.
One example of such a data set is spectral data produced from Fourier-transform mass spectrometers. Fourier-transform mass spectrometers (FTMS) are the most accurate class of contemporaneous mass spectrometers. FTMS uses the ion cyclotron resonance principle to determine the m/z (mass-charge ratio) of sample molecules. Ions are stored inside an analyzer cell situated within a homogenous magnetic field such that the ions move in orbits having a radius corresponding to their cyclotron frequencies. FTMS uses a resonance method to detect the image current signal generated by the ions in the cell. The orbits of the ions can be manipulated by adding energy to the ions (such as by applying an RF frequency burst (a chirp) to the cell containing the ions and thus, increasing the velocity of the ions). Furthermore, the cell is designed such that when ions pass close by an electrode, the moving ions induce a charge on the electrode. This induced charge generates a sinusoidal image current that can be measured and that decays as the ions return to their original orbits. A Fourier Transformation applied to the image current generates an amplitude/frequency representation of the image current. This amplitude/frequency representation can be transformed to a m/z spectrum that is similar to the spectrum produced by other classes of tandem mass spectrometers. One skilled in the art will understand that if the measured ion is singly charged, the m/z of a spectral peak represents the mass of the ion represented by that spectral peak.
FTMS spectra have two characteristic that are different from the spectra produced by other classes of mass spectrometers. One of these characteristics is that the spectra resolution is greater than that of the other classes of mass spectrometers. Another characteristic is that FTMS spectra have a carpet of noise data values throughout the data set comprising the spectra. This carpet of noise data values in the spectra makes it difficult to reliably distinguish valid data values resulting from low-abundance compounds in the analyzer cell from the noise data values.
The technology disclosed herein can be applied to data sets other than spectral data so long as the data set includes sufficiently more noise data values than valid data values.
It would be advantageous to provide a reliable method of distinguishing valid data values from noise data values in a data set where the computational cost of the analysis of the data set is dependent on whether a noise data value is incorrectly identified as a valid data value. It would also be advantageous to be able to accurately distinguish noise data values from valid data values so that analysis of the data set is will not provide incorrect results.
In the context of FTMS spectra, it would be advantageous to provide a reliable method of distinguishing valid data values representing signal peaks resulting from small numbers of ions in the measured sample from the noise data values that are characteristic of spectra generated by a Fourier-transform mass spectrometer.