Methods to identify one or more most likely elemental compositions of one species of molecules, mostly various species of molecules, are in general available. Preferably these methods are used to identify the most likely elemental composition of species of molecules like herbicides, insecticides, other pesticides, lipids, soluble or suspended solids in leachates, metabolites, drugs, narcotics, molecules in extracts having typically a mass of up to 400 u, preferably up to 500 u and particularly preferably up to 600 u.
These methods are used to investigate samples. By these methods an elemental composition is identified for species of molecules contained the investigated sample.
A species of molecules is defined as a class of molecules having the same molecular formula (e.g. water has the molecular formula H2O and benzene the molecular formula C6H6.) By the molecular formula of a species of molecules is the elemental composition of the species of molecules described. The molecular formula is listing all elements which are contained in the molecule by indicating the symbol of the elements according to the periodic table of chemical elements of IUPAC and is listing by the index on the right side of the symbol of the element the number of atoms of the element which the molecule is consisting of. So for a simple example a benzene molecule, which is having the molecular formula C6H6, is consisting of 6 carbon atoms (symbol C) and 6 hydrogen atoms (symbol H). Molecules having the same moelcular formula may have different structural formulas due to different isomeric forms, which may have different enantiomeric structures resulting in different physical, chemical and biological properties.
There are much more complicated molecules of bigger molecular formulas, e.g. in organic matters. For example there is the herbicide sulfentrazone, which is having the molecular formula: C11H10Cl2F2N4O3S. Pesticides like sulfentrazone are not allowed to be used in many countries. Sulfentrazone may bear e.g. a greater risk to aquatic species and honey bees.
Sometimes the investigated sample can be better understood by ions which are originated from the sample by at least an ionization process and the elemental composition of the ions. The ions may be preferably generated by electrospray ionization (ESI), matrix-assisted laser desorption ionization (MALDI), plasma ionization, electron ionization (EI), chemical ionization (CI) and atmospheric pressure chemical ionization (APCI). The generated ions are charged particles mostly having a molecular geometry and a corresponding molecular formula. In the context of this patent application the term “species of molecules originated from a sample by at least an ionisation process” shall be understood is referring to the molecular formula of an ion which is originated from a sample by at least an ionization process. So the elemental composition of a species of molecules contained in a sample can be deduced from its ion which is originated from the sample by at least an ionization process ionizing the species of molecules by looking for the elemental composition of the ion and then reducing the charge of the ion to zero and changing the elemental composition accordingly to the ionisation process as described below.
So the methods to identify a most likely elemental composition of one species of molecules can be also used to identify the elemental composition of ions which are originated from a sample by at least an ionization process.
In the species of molecules all molecules have the same composition of atoms according to the molecular formula. But most atoms of the molecule can occur as different isotopes. For example the basic element of the organic chemistry, the carbon atom occurs in two stable isotopes, the 12C isotope with a natural probability of occurrence of 98.9% and an isotope mass of 12 u and the 13C isotope (having one more neutron in its atomic nucleus) with a natural probability of occurrence of 1.1% and an isotope mass of 13.003355 u. Due to these probabilities of occurrence of the isotopes particularly complex molecules of higher mass consisting of a higher number of atoms have a lot of isotopomers, in which the atoms of the molecule exist as different isotopes. In the whole context of the patent application these isotopomers of a species of molecule designated as the “isotopes of the species of molecule”. These isotopes have different masses resulting in a mass distribution of the isotopes of species of molecules, named in the content of this patent application isotope distribution (short term: ID) of the species of molecules. Each species of molecules therefore can have different masses but for a better understanding and identification of a species of molecules to each molecule is assigned a monoisotopic mass. This is the mass of a molecule when each atom of the molecule exists as the most abundant naturally occurring stable isotope. For example a methane molecule has the molecular formula CH4 and hydrogen has the isotopes 1H having on a proton in its nucleus with a natural probability of occurrence of 99.985% and an isotope mass of 1.007825 u and 2H (deuterium) having an additional neutron in its nucleus with a natural probability of occurrence of 0.015% and an isotope mass of 2.014102 u. So the most abundant naturally occurring stable isotope of carbon is 12C and the most abundant naturally occurring stable isotope of hydrogen is 1H. Accordingly the monoisotopic mass of methane is 16.031300 u, which is the mass of the methane isotope consisting of one 12C isotope and four 1H isotopes. But there is a small probability of other methane isotopes having the masses 17.034655 u (comprising a 13C isotope) and 17.037577 u (comprising one 2H isotope), 18.040932 u (comprising a 13C isotope and one 2H isotope) and 18.043854 u (comprising two 2H isotopes), 19.047209 u (comprising a 13C isotope and two 2H isotopes) and 19.050131 u (comprising three 2H isotopes), 20.053486 u (comprising a 13C isotope and three 2H isotopes) and 20.056408 u (comprising four 2H isotopes) and 21.059763 u (consisting of a 13C isotope and four 2H isotopes). All these other isotopes belong to the isotope distribution of methane and can be visible in the mass spectrum of methane in a mass spectrometer.
The identification of the most likely elemental composition of at least one species of molecules is particularly possible by measuring a mass spectrum of the investigated sample with a mass spectrometer. In general every kind of mass spectrometer can be used known to a person skilled in the art to measure a mass spectrum of the sample. In particular it is preferred to use a mass spectrometer of high resolution like a mass spectrometer having an Orbitrap® mass analyzer or other electrostatic ion traps as mass analyzer, a Fourier transform (FT) mass spectrometer, an ion cyclotron (ICR) mass spectrometer or a multi reflection time of flight (MR-TOF) mass spectrometer. Other mass spectrometers for which the inventive method can be applied are particularly time of flight (TOF) mass spectrometer, magnetic sector mass spectrometer and mass spectrometer with a high resolution (HR) quadrupole mass analyzer.
Molecules already present in the sample are set free e.g. by evaporation and spraying and charged or are only charged by the ionization process. The molecules may be charged e.g. by the reception and/or emission of electrons or the receptions of ions to form an adduct ion. The method of the invention is able to assign to these species of molecules contained in the sample its most likely elemental composition due to their ions which are detected in the mass spectrum of the mass spectrometer.
The ionization process can change the molecules contained in the sample by fragmentation to smaller charged particles which are charged due to the process. Also by an ionization process the matrix of a sample can be split into molecules which are charged. So all these ions are originated from the sample by a described ionization process. So for these ions their species of the molecules originated from the sample have to be investigated by a method for identification of elemental composition of the species of molecules.
Ultra-high resolution mass spectrometry, such as is achievable using a Fourier transform ion cyclotron resonance mass spectrometer (FT-ICR-MS), or an Orbitrap™ mass spectrometer, enables the identification of thousands of different molecular formulas in organic matter. Coupled with liquid chromatography (LC), accurate mass determination of components of complex mixtures can be made on a routine basis. Applications include, amongst others, screening combinatorial chemistry libraries and identifying metabolites related to drug discovery, screening for anabolic steroids in illegal cocktails and fungal metabolites in culture extracts, and elucidating unknown compounds in environmental water.
The output from the mass spectrometer must be interpreted before samples can be characterised, and this presents technical problems. Molecular formula assignment from mass data is most critical and time-consuming. Accurate mass measurement by mass spectrometry is a common technique to determine elemental composition, facilitated by ultra high resolution mass spectrometers. Despite technological advances and improved mass accuracy, often the mass accuracy alone does not provide unequivocal identification. In many cases, several different structural formulae can be identified for the same molecular mass. The number of candidate formulae increases exponentially with mass, making high mass molecular determination particularly challenging. Therefore, automated procedures are required for an efficient exploitation of the extensive data sets produced by mass spectrometry, when characterising samples.
Typically the species of molecules for which the elemental composition has to be identified are composed from a specific set of elements. For each element is defined how much atoms of the element might be contained in the species of molecules. For each element X the number of atoms contained in the species of molecules may be limited. There is a minimum number Minx of atoms of the element X and a maximum number Max, of atoms of the element X in the species of molecules.
Various methods of determining the elemental composition of species of molecules contained in a sample and/or originated from a sample by at least an ionization process have been proposed, yet there remains a need for a method that identifies the elemental composition with further inproved correctness taking into account all information made available by a measured mass spectrum.
The well-known method to identify the most likely composition of species of molecules is the method to calculate a pattern spectral distance (PSD) described in the U.S. Pat. No. 8,831,888 B2. With this method a measured mass spectrum is compared with expected mass spectra of molecules belonging to a set of candidate molecules. Peaks in the measured spectrum and the expected spectrum are assigned to each other by calculating a spectral distance value SD. By this value the positional difference and the intensity difference is taken into account and for each expected peak the measured peak with the smallest spectral distance is assigned. If no peak can be identified within an expected positional error and an expected intensity error, no peak identification was possible. When the pattern spectral distance value is calculated for the whole expected mass spectrum of a molecule according to its isotope distribution any non-identified peak in the calculation gets a penalty value. There are two modes described for the pattern spectral distance value. One time the penalty is given for any non-identified expected peak, in the other mode the penalty is given for any non-identified measured peak.
Another method is described by Pluskal et al., Anal. Chem. 2012, 84, 4396-4403, to identify the elemental composition of molecules. In this method a score is defined to identify the best matching expected mass spectrum of a candidate species molecule with a measured mass spectrum, which is taking into account the intensity difference of measured and expected peaks in a mass tolerance range. Further a comparison of a measured MS2 mass spectrum with the expected MS2 mass spectrum after fragmentation of the candidate species molecule is used to exclude candidate species molecule.
A further method to identify the elemental composition of molecules is described by Meringer et al., Commun. Math, Comput. Chem. 65, 259-290 (2011). In this method a score is defined to identify the best matching expected mass spectrum of a candidate species molecule with a measured mass spectrum, which is taking into account the intensities of measured and expected peaks. Further from a comparison of a measured MS2 mass spectrum with the expected MS2 mass spectrum after fragmentation of the candidate species molecule a second score is derived and then both scores are used in a combined score to identify an elemental composition of a molecule.
Also in Tenhosaari, Organic Mass Spectrometry, Vol. 23, 236-239 (1988) and Zhang et al., IEEE/ACM Transactions on Computational Biology and Bioinformatics, Vol. 2, No. 3, 217-230 (2005) two scores are derived from a comparison of a measured mass spectrum with an expected mass spectrum and a comparison of an MS2 mass spectrum with an expected MS2 mass spectrum and then used in an combined score to identify an elemental composition of a molecule.
It is the object of the invention to find a method of identification which is able to improve the correctness of the identified elemental composition of species of molecules further, when the elemental composition shall be identified based on measured mass spectra. It is one object of the invention to take into account as much information as possible of a measured mass spectrum. This is particularly important if mass spectra of high resolution or ultra high resolution are available which are increasing the amount of usable information. A further object of the invention is that the method of high correctness shall determine one or more most likely elemental compositions of the investigated species of molecules in a manner which does not need too much time and capacities. A further object is that the method shall be able to be adapted to any class of investigated molecules and shall be able to provide procedures to improve the correctness and/or reduce the effort of the method further.