Gas chromatography mass spectrometry (GC-MS) experiments separate small molecules on a GC column coupled to an ionization source. After ionization, the molecules are then mass analyzed. One typical ionization methods is electron ionization (EI) which causes molecules to fragment in reproducible patterns which are useful for analyte identification. Typically, user-generated EI spectra are identified by spectral matching against databases of reference spectra, including several existing databases of EI spectra generated from pure compounds collected on unit-resolution mass spectrometers (i.e., ˜1 Da reference libraries provided by NIST, Wiley, etc.).
However, this method can lead to ambiguity in assigned identifications of analytes due to the poor specificity of unit-resolution spectra. There are many cases where distinct compounds generate similar EI spectra, leading to a high number of false identifications. Furthermore, the degree of spectral similarity between observed and reference spectra, the metric used to assign identification confidence, is ambiguous and subject to human judgment.
Previously, researchers have constructed a high-resolution GC-Orbitrap mass spectrometer capable of collecting high-resolution EI spectra (see, for example, Peterson et al., “Development and characterization of a GC-enabled QLT-Orbitrap for High-resolution and high-mass accuracy GC/MS,” Anal. Chem., 2010, 82(20):8618-28). However, currently available spectra libraries (such as provided by NIST and Wiley) do not contain high-resolution spectra and instead remain as unit-resolution libraries.
What is needed is a method of enabling high-resolution spectral matching using currently available unit-resolution reference libraries. These available databases contain hundreds of thousands of reference spectra which would be prohibitively costly to recreate using high-resolution GC-MS instruments. The invention presented herein provides a means to leverage high-resolution spectra to achieve superior spectral matching specificity with such existing resources. Using high-resolution accurate mass measurements would increase spectral match confidence without the need for high-resolution reference libraries.
Others have used predictive fragmentation models (i.e., theoretical high-resolution spectra generated by algorithms that carry out predictive in silico fragmentation) in an attempt to increase specificity in spectral matching. Using this approach, known molecular structures and bonding energies are used to develop algorithms that predict EI fragmentation. Very rarely, if ever, are these algorithms able to generate spectra which correlate exactly with experimentally measured spectra. Often the predictive spectra are extremely dissimilar to their measured analogs leading to an increased possibility of false identifications. An embodiment of the present method starts with experimentally observed patterns in measured reference data, maintaining important peak and intensity relationships that are not easily accounted for in predictive models.
The present invention provides methods and systems for analyzing data obtained from a high-resolution mass spectrometer using unit-resolution spectral data in combination with additional filtering and scoring steps. Moreover, the present invention enables high-resolution matching using currently available unit-resolution reference libraries. These available databases contain hundreds of thousands of reference spectra that would be cost prohibitive to recreate using high-resolution GC-MS instruments. Thus, the invention allows the use of newly obtained high-resolution spectra to achieve superior spectral matching specificity with existing resources.
The invention presented herein is a useful tool to increase compound identification using obtained high-resolution mass spectra, such as spectra obtained during GC-MS. In an embodiment, for example, the methods of the present invention start with experimentally observed patterns in measured reference data, which maintains important peak and intensity relationships that are not easily accounted for in predictive models. Accordingly, aspects of the methods and systems described herein are complementary, or superior, to spectral matching done against theoretical high-resolution spectra generated by certain conventional algorithms.