In a number of fields, complex systems or states are characterized primarily by the large data sets that are generated. In analyzing protein expression from a cell to determine whether that cell is cancerous, for example, a large set of protein expression data from that sample cell is compared with reference data generally consisting of a large set of protein expression data from one or more representative known cancerous cells and a large set of protein expression data from one or more representative known non-cancerous cells. In general, a key goal of systems biology is to investigate complex biological samples, for example at the protein level [1] or gene or DNA or other biologic characteristics levels. Proteomics methods in general and mass spectrometry in particular are offering promise in discovery of potential drug targets and biomarkers [2, 3, 4, 5, 6, 7], and in diagnostic applications [8, 9, 10, 11, 12]. Although the field is blooming, many challenges remain both on the experimental and data analysis fronts [13, 14].
Data analysis of biologic systems is challenging in part due to the large size of raw data sets as well as the exponential nature of processing steps that are required in many data analysis methods to statistically analyze such sets. In some situations, methods useful for analyzing biologic data may have applications in other data analysis areas requiring handling of large data sets.
Typical proteomics processing pipeline consists of the following steps:
1. Quantization of detector values
2. Amplitude normalization
3. Peak Detection and quantification
4. M/Z and time alignment
5. Classification and biomarker discovery.
The order of these steps can vary but generally all methods place peak detection before the classification step and most do it before the alignment and normalization steps. See Listgarden and Emili for a review of methods, challenges and approaches to proteomics analysis [15].
More recently, many various strategies and techniques have been proposed for improving and/or automating research and/or diagnostic tests using LC-MS data.
The discussion of any work, publications, sales, or activity anywhere in this submission, including in any documents submitted with this application, shall not be taken as an admission by the inventors that any such work constitutes prior art. The discussion of any activity, work, or publication herein is not an admission that such activity, work, or publication was known in any particular jurisdiction.