This invention relates to an apparatus and a method of aligning multiple liquid chromatography-mass spectrometry (LC-MS) runs to a common reference time frame to facilitate comparison among runs. The present invention can employ a sparse solver to align multiple runs simultaneously, make efficient use of the data, and/or provide a means of quality control.
The retention time alignment problem is frequently encountered in LC-MS. As the main workhorse technology for large-scale protein profiling of biomedical samples, LC-MS has clinical applications such as the discovery of biomarker, e.g., markers that can predict disease states, sub-categories, or clinical outcome. Due to the huge number of proteins and their fragments present in biological samples, an LC-MS run routinely collects a large number of peaks (e.g., greater than 104) within a couple of hours. Each peak has a mass, a retention time, and intensity. While the mass shift across different runs is typically small, the amount of shift in the retention time, i.e., the retention time shift, may be large and/or nonlinear. The retention times from each LC-MS run require sophisticated alignment to a common reference time frame to enable matching of peaks from different LC-MS runs.
Multiple methods have been developed to solve the retention time alignment problem. For example, F. Suits, et al., “Two-dimensional method for time aligning liquid chromatography-mass spectrometry data,” Anal Chem, 80 (9), pp. 3095-104, (2008) discloses an example of a method, referred to as “Warp2D,” for solving the retention time alignment. However, most methods, including Warp2D, provide only a pairwise alignment, i.e., alignment of a single LC-MS run to another LC-MS run.
To align multiple runs, most prior art methods align all runs to a common reference run, which is often arbitrarily chosen. An alternative approach is to align runs in a hierarchical fashion. In this approach, two most similar runs are merged in each step. The drawback of the hierarchical approach, like most tree-based approaches, is the accumulation and amplification of errors made in the early steps.