Mass spectrometry can be applied to the search for significant signatures that characterize and diagnose diseases. These signatures can be useful for the clinical management of disease and/or the drug development process for novel therapeutics. Some areas of clinical management include detection, diagnosis and prognosis. More accurate diagnostics may be capable of detecting diseases at earlier stages.
A mass spectrometer can histogram a number of particles by mass. Time-of-flight mass spectrometers, which can include an ionization source, a mass analyzer, and a detector, can histogram ion gases by mass-to-charge ratio. Time-of-flight instruments typically put the gas through a uniform electric field for a fixed distance. Regardless of mass or charge all molecules of the gas pick up the same kinetic energy. The gas floats through an electric-field-free region of a fixed length. Since lighter masses have higher velocities than heavier masses given the same kinetic energy, a good separation of the time of arrival of the different masses will be observed. A histogram can be prepared for the time-of-flight of particles in the field free region, determined by mass-to-charge ratio.
Mass spectrometry with and without separations of serum samples produces large datasets. Analysis of these data sets can lead to biostate profiles, which are informative and accurate descriptions of biological state, and can be useful for clinical decisionmaking. Large biological datasets usually contain noise as well as many irrelevant data dimensions that may lead to the discovery of poor patterns.
When analyzing a complex mixture, such as serum, that probably contains many thousands of proteins, the resulting spectral peaks show perhaps a mere hundred proteins. Also, with a large number of molecular species and a mass spectrometer with a finite resolution, the signal peaks from different molecular species can overlap. Overlapping signal peaks make different molecular species harder to differentiate, or even indistinguishable. Typical mass spectrometers can measure approximately 5% of the ionized protein molecules in a sample.
Performing analysis on raw data can be problematic, leading to unprincipled analysis of both data points and peaks. Raw data analysis can treat each data point as an independent entity. However, the intensity at a data point may be due to overlapping peaks from several molecular species. Adjacent data points can have correlated intensities, rather than independent intensities. Ad hoc peak picking involves identifying peaks in a spectrum of raw data and collapsing each peak into a single data point.
Mass spectra of simple mixtures, such as some purified proteins, can be resolved relatively easily, and peak heights in such spectra can contain sufficient information to analyze the abundance of species detected by the mass spectrometer (which is proportional to the concentration of the species in the gas-phase ion mixture). However, the mass spectra of sera or other complex mixtures can be more problematic. A complex mixture can contain many species within a small mass-to-charge window. The intensity value at any given data point may have contributions from a number of overlapping peaks from different species. Overlapping peaks can cause difficulties with accurate mass measurements, and can hide differences in mass spectra from one sample to the next. Accurate modeling of the lineshapes, or shapes of the peaks, can enhance the reliability and accurate analysis of mass spectra of complex biological mixtures. Lineshape models, or models of the peaks can also be called modeled mass-to-charge distributions.
Signal processing can aid the discovery of significant patterns from the large volume of datasets produced by separations-mass spectrometry. Mass spectral signal processing can address the resolution problem inherent in mass spectra of complex mixtures. Pattern discovery can be enhanced from signal processing techniques that remove noise, remove irrelevant information and/or reduce variance. In one application, these methods can discover preliminary biostate profiles from proteomics or other studies.
Therefore, it is desirable to reduce the noise and/or dimensionality of datasets, improve the sensitivity of mass spectrometry, and/or process the raw data generated by mass spectrometry to improve tasks such as pattern recognition.