The invention relates to a method for identifying chiefly unknown substances by mass spectroscopy to determine the structure and/or families and/or the chemical properties of said substances.
Mass spectrometry is one of the currently most common methods for analyzing chiefly unknown substances (for example J. H. Gross: Mass Spectrometry: A Textbook, Springer publishing house Berlin, 2004).
Mass spectrometry allows precise determination of the molecular mass of the analyzed substance. Furthermore, it is possible to fragment a is substance in the mass spectrometer once or several times, i.e. to break its chemical bonds. Subsequently, the masses of the fragments produced in this way will also be measured. As a result one or several fragmentation spectra are generated (also called daughter ion spectra).
However, it is problematic, particularly for unknown chemical compounds, to identify the structure and/or families and/or chemical properties of these compounds because only masses can be determined by mass spectrometry.
The original form of a lot of pharmaceuticals and other chemical substances used in industry and research is produced by living beings and has been discovered by chance or by a very complex search. Most of the substances produced by living beings are still completely unknown in research.
The method described hereinbelow can simplify the systematic search for potential active agents considerably by, for example, identifying all substance families of all small substances (lighter than 1500 dalton) that are contained in a biological sample. Afterwards, only those compounds must be analyzed more precisely that belong to the families which are relevant for the field of application.
The substance identification of pharmaceuticals and natural compounds is particularly interesting because of the high importance of these substances for medicine as well as pharmaceutical and biological research. Natural compounds are all substances that are contained in animate and inanimate nature, i.e. most of all in plants and animals but also in fossil deposits. Said natural compounds include, for example, all metabolites produced by chemical or enzymatic reactions, but also the decomposition products of substances that are added to nature by man, e.g. pharmaceuticals or environmental toxins. Even if natural compounds are probably the main field of application of the method described hereinbelow, the method is not restricted to them. The application of this method is also possible in other areas of chemistry, for example in materials science.
As natural compounds mainly exist as mixtures (e.g. cell extract, environmental sample) a separation procedure is often carried out before starting mass spectroscopy in order to separate the substances to be indentified for the mass spectrometrical analysis. Usually, this separation process is gas or liquid chromatography or capillary electrophoresis (for example U. Roessner, C. Wagner, J. Kopka, R. Tretheway, L. Willmitzer: Technical advance: simultaneous analysis of metabolites in potato tuber by gas chromatography-mass spectrometry, Plant J, 2000, 23, 131-142).
It is known (for example R. Mistrik: Xcalibur HighChem: Mass Frontier Software. HighChem/ThermoFinnigan, Manual 2001) to compare fragmentation patterns, which are determined by mass spectrometrical analysis, with idealized patterns, so called rules, that have been manually obtained from reference data. Such a comparison could be principally automated but it requires that the corresponding rules for the analyzed substance have been generated. Therefore, this method cannot be used at all for unknown substances. Moreover, these rule-based approaches cannot process error-containing data and consequently they are not useful in practical applications (K. Klagkou, F. Pullen, M. Harrison, A. Organ, A. Firth & G. J. Langley: Approaches towards the automated interpretation and prediction of electrospray tandem mass spectra of non-peptidic combinatorial compounds, Rapid Commun Mass Spectrom, 2003, 17, 1163-1168).
In the special case, in which a fragmentation spectrum that has been generated under the same measurement conditions has already an identical equivalent in a reference database, it would be possible to find the analyzed substance in a computational comparison by searching the identical spectrum in the reference database and to identify said substance in this manner (L. Vogt, T, Groeger & R. Zimmermann: Automated compound classification for ambient aerosol sample separations using comprehensive two-dimensional gas chromatography-time-of-flight mass spectrometry, J Chromatogr A, 2007, 1150, 2-12; DE 103 58 366 B4, U.S. Pat. No. 6,624,408 B1, US 2003 023 66 36 A1, U.S. Pat. No. 6,747,272 B2).
This method does not function for completely unknown substances because it requires a reference spectrum of the substance in the database. Furthermore, fragmentation spectra depend partly very much on external parameters and therefore they differ from lab to lab. Direct comparisons between spectra are not convincing in this case. Therefore, the search for an existing identical reference spectrum obtained under comparable conditions is only possible in very few applications.
To avoid the latter disadvantage it is also known to search fragment ions in a database where they are stored as defined fragmentation patterns (U.S. Pat. No. 7,197,402 B2). Either these ions must possess a known, clear structure or fragmentation spectra of these ions must be measured in an additional mass spectrometrical analysis. These spectra produced by multiple fragmentation (MSn) should, as indicated, be more comparable than the ‘single’ fragmentation spectra mentioned before.
However, this procedure is also limited to the identification of known (and electronically saved) substances. Furthermore, the multiple fragmentation can only be performed by using very special types of mass spectrometers so that the additional efforts are further increased.
If substances are to be identified for which reference data or comparison or identification rules do not exist completely or do not exist at all, it will still be necessary, at least in individual cases, to evaluate smaller molecules on the basis of their fragmentation pattern, i.e. intensive investigations must be carried out to find out if comparable similarities to known structures can be found that could allow or at least support the determination of a substance family, the chemical properties or even the molecule structure (P. Shi, Q. He, Y. Song, H. Qu and Y. Cheng: Characterization and identification of isomeric flavonoid O-diglycosides from genus Citrus in negative electrospray ionization by ion trap mass spectrometry and time-of-flight mass spectrometry, Anal. Chim. Acta, 2007, 598, 110-118). However, this evaluation is subjective and time-consuming and it is based on human intuition. Therefore, it is not an objective and rapid substance identification but requires high expert knowledge and extensive experience in this field. Nevertheless, the hit ratio even for smaller molecules is not very high in practical applications. Moreover, the method cannot be automated for the aforementioned reasons. The evaluation of larger molecules by means of the described method would not be useful in practice, particularly due to the high demands placed on the expert and the expected low hit ratio.
In 2008, Boecker and Rasche (S. Boecker & F. Rasche: Towards de novo identification of metabolites by analyzing tandem mass spectra, Bioinformatics, 2008, 24, 149-155) have introduced a mathematical formalization of the concept of fragmentation patterns. In their method they used graphs to represent the fragmentation pattern of a substance. A graph should be an amount of objects, usually designated as nodes, and a set of pairs from the elements of this amount, usually designated as edges. This set of pairs represents the relations of the objects between each other. In this case, the fragments of the substance are represented as nodes and the fragmentation reactions are represented as edges. As the structure of the analyzed substance is not known, the nodes are marked with the total formulas of the fragments and the edges are marked with the total formulas of the neutral losses. These fragmentation graphs are used to determine the total formula of an unknown substance. However, total formulas alone are not sufficient to identify a substance and do not allow to determinate the family of the analyzed substance. A use of the proposed graphs of fragmentation patterns for identifying particularly unknown substances or for determining their family and/or chemical properties have not come to the attention of the experts either.
Furthermore, in a special biological or medical application the alignment of trees is known for comparing RNA structures (T. Jiang, L. Wang & K. Zhang: Alignment of trees: an alternative to tree edit, Theor. Comput. Sci., Elsevier Science Publishers Ltd., 1995, 143, 137-148). In this method, the marked nodes of the trees to be compared are positioned on top of each other in such a manner that the markings differ as little as possible from each other. The trees must be identical in their structure; only so called gap nodes may be added in the branches of the tree presentation, if required. Applications of this method, particularly to identify substances or their family and/or chemical properties in mass spectrometrical analyses of said substances, are not known either.