This invention relates generally to Mass Spectrographic analysis, and more specifically to the identification of organic compounds in complex mixtures of organic compounds.
Mass spectrometry (MS) is a widely used technique for the identification of molecules, both in organic and inorganic chemistry. MS may be thought of as a weighing machine for molecules. The weight of a molecule is a crucial piece of information in the identification of unknown molecules, or in the identification of a known molecule in a unknown mixture of molecules. Examples of situations in which MS analysis may be used include drug development and manufacture, pollution control analysis, and chemical quality control.
MS is frequently used in conjunction with other analysis tools such as gas chromatography (GC) and liquid chromatography (LC), which help to simplify the analysis of MS spectra by essentially spreading out the timing of the arrival of the individual components of a chemical mixture to the MS system. Thus, the number of different molecular species in the mass spectrometer at any one time is reduced, and separation of mass spectrum peaks is simplified. This procedure works well for chemical samples that contain on the order of 10 to 20 different molecular species, but is inadequate for analyzing samples that contain thousands of different species.
Mass spectrometry operates by first ionizing the chemical material of interest in an ionization source. There are many well known ionization sources in the art, such as electrospray ionization (ESI) and atmospheric pressure chemical ionization (ApCI). The above mentioned ionization methods generally produce what is known in the art as a protonated molecule, meaning the addition of a proton or a hydrogen nucleus, [M+H].sup.+, where M signifies the molecule of interest, and H signifies the hydrogen ion, which is the same as a proton.
Some ionization methods will also produce analogous ions. Analogous ions may arise by the addition of an alkaline metal cation, rather than the proton discussed above. A typical species might be [M+Na].sup.+ or [M+K].sup.+. The analysis of the ionized molecules is similar irrespective of whether one is concerned with a protonated ion as discussed above or dealing with an added alkaline metal cation. The major difference is that the addition of a proton adds one mass unit (typically called one Dalton), for the case of the hydrogen ion (i.e., proton), 23 Daltons in the case of sodium, or 39 Daltons in the case of potassium. These additional weights or masses are simply added to the molecular weight of the molecule of interest and the MS peak occurs at the point for the molecular weight of the molecule of interest plus the weight of the ion that has been added.
These ionization methods can also produce negative ions. The most common molecular signal is the deprotonated molecule [M-H].sup.-, in this case the mass is one Dalton lower than the molecular weight of the molecule of interest. In addition, some ionization methods will produce multiply charged ions. These are of the general identification type of [M+nH].sup.n+, where small n identifies the number of additional protons that have been added.
The ions produced in any of the ionization methods discussed above are passed through a mass separator, typically a magnetic field, a quadrupole electromagnet, or a time-of-flight mass separator, so that the mass of the ions may be distinguished, as well as the number of ions at each mass level. These mass separated ions go into a detector and the number of ions is recorded. The mass spectrum is usually shown as a chart such as FIG. 1, which illustrates the case of ionized carbon. Note that in this case there are two significant peaks, each representing a different atomic isotope of carbon. In the figure the normalized intensity, or number of ions detected, is displayed on the vertical scale, and the mass to charge ratio (m/z, sometimes also known as Da/e) of the ion is recorded on the horizontal axis. In cases where the charge on the ion of interest is equal to one, as in the case of the singly protonated molecular ions, this mass to charge ratio (m/z) is exactly equal to the mass of the ion of interest plus the mass of the proton.
The situation is not always as simple as that shown in FIG. 1. FIGS. 17a-c show spectra for a single moderate sized organic molecular species containing 1-3 bromine atoms. Even though there is only a single molecular species represented in the spectrum, there are many significant large ion peaks. For example, the peaks at mass 553 indicate the base molecule of interest with all of the carbon atoms being C-12, and all of the bromine atoms being Br-79. The peak at 555 has one Br-79 replaced with the isotope Br-81, and the smaller peak between 553 and 555 is due to one C-12 being replaced by a C-13. The peaks at m/z 556 represent one Br-81 substitution and one C-13 substitution, and so on. In general there will also be lower m/z peaks that represent fragments of the original molecule and various isotope substitutions. Thus any molecule that contains carbon, bromine or a number of other well known elements having isotopes, will always have multiple peaks, making spectrum analysis difficult.
It is often possible to identify the specific molecular species generating a MS signal by discerning its molecular weight, since different chemicals typically have different molecular weights. MS is a powerful tool in the analysis of unknown pure organic compounds because it can identify the molecular weight or mass of the compound, thus helping to identify the specific compound by limiting the number of possible compounds. MS is a useful tool, but as just demonstrated there are many ways to incorrectly identify a peak, and the analysis can be time consuming and expensive.
Furthermore, if the sample of interest contains more than one compound (i.e., it is a mixture of different materials), then the mass spectrum may become even more difficult to interpret. It may not be easy to identify which particular peak in the spectrum corresponds to a specific compound in the sample introduced. Therefore, as was previously noted, to help analyze complex mixtures it is known in the prior art to do some preliminary separation of the mixture prior to introduction into the mass spectrometer by the use of gas chromatography (GC) or liquid chromatography (LC). For example LC/MS (meaning liquid chromatography/mass spectrometry), is frequently employed in the analysis of drug metabolites in drug discovery laboratories, where it is used to identify which compound has a specific action in living creatures. It is also known to use GC/MS in environmental pollution analysis. This is typically done in cases involving volatile materials, for example dioxins or polychloronated biphenyls. It is possible to identify a specific material of interest, such as dioxin, by looking for the known mass spectrographic characteristic of a dioxin, i.e., its weight, its isotope distribution, and chromatograph retention time. In the above noted examples, the LC and GC methods are used to allow the sample of the unknown mixture of chemicals to enter the mass spectrometer in a known sequence. Preferably only one compound will enter the MS system at a time. By knowing how long it takes the material of interest to move through a gas chromatograph, it is then possible to know at what time the material will enter the mass spectrometer. Looking at the mass spectrometer output during the expected time for dioxin gives a fairly good chance of identifying the dioxin signature without having the signal cluttered by other materials whose mass spectrum may overlap that of dioxin. Thus, it is known in the art to use MS for analyzing sets of chemical compounds with the addition of gas chromatographic or liquid chromatographic separation at the beginning of the Mass Spectrometer. Such systems produce what are known as total ion chromatograms (TICs) which show the number of ions as a function of time. A typical TIC is shown in FIG. 3 for a LC/MS analysis of a mixture containing 5,000 different compounds. There is a signal peak at almost every possible time point and thus analyzing TIC data is difficult because of the large number of data points.
To help solve the data problem, it is known in the prior art to analyze GC/MS or LC/MS spectra by generating what are known as extracted ion chromatagrams (XIC) in which each mass point in the TIC spectrum in the data set is examined over the total sample time for an ion signal which corresponds to the mass of the component of interest. FIG. 4b shows the XIC obtained by plotting the data in the TIC of FIG. 4a for the m/z value 911.5 ion. The XIC contains mass to charge information in addition to the time of arrival. FIG. 4c is an XIC for the m/z range 911.5 to 910.5 ions. These XIC charts are examined for the presence or absence of a peak, thereby either identifying the presence of an ion of interest with the expected mass, or demonstrating the absence of the expected ion. This technique works when examining mixtures of up to 20 different known compounds, but is not well suited to the analysis of hundreds of mixed compounds, because there is a high probability that two or three of those hundreds of mixed components or compounds will have similar chromatographic retention times, and thus arrive roughly simultaneously at the Mass Spectrometer. In a highly complex mixture, there may be multiple materials producing ions at any given m/z values, some or none of which correspond to the compounds of interest.
Since both the TIC and XIC are difficult to interpret when examining mixtures of compounds containing hundreds to thousands of molecular species, it is possible to make a three dimensional graph such as FIG. 5, which presents both time and m/z data. FIG. 5 again shows that GC/MS or LC/MS may be useful when examining mixtures having 5 to 10 different compounds, as shown here, but the number of peaks is too high for simple analysis if the number of different compounds exceeds 20 or so.
There exist problems with automated Mass Spectrometer analysis in the art. One such problem is that the software is limited to the specific set of problems for which it is designed. There are no software packages capable of general automated analysis of Mass Spectrographic mixtures of compounds. Problems in automated analysis of complex mixtures include the likelihood that some ions will be observed at almost every m/z ratio, (i.e., mass to charge ratio) everywhere within the experimental sample. For example, refer again to FIG. 3, showing a LC/MS chromatogram TIC, showing the number of ions detected versus time from a complex mixture containing roughly five thousand different components. It is clear from FIG. 3 that there is an ion peak at every time point in the range. FIG. 4b is a XIC spectrum that shows that there are positive XIC at m/z ratio 911.5 at many places in the course of the MS run. The large number of peaks is due in part to each compound having multiple peaks as discussed above because of isotopes. There may also be peaks that result from multiply charged components with twice the weight and twice the charge. There may be peaks from various chemical contamination or noise. There may be peaks due to electronic noise or system resolution limits. Thus, automated analysis methods can not find the preprogramed peaks, because it is not clear from the XIC alone whether the signal at the expected m/z ratio of the compound of interest is a real indication of the presence of the expected compound, or whether it is a false signal due to an isotope of a different compound, etc. All of the above noted problems exist in the art of mass spectrographic analysis, whether automated or manual.
To summarize the problems in the art, the isotope pattern problem discussed above typically appears as two or more peaks with slightly different masses, typically one mass unit different. This is due to the fact that most elements in organic synthesis contain carbon. They contain isotopes of carbon in the normal proportion in which carbon isotopes exist in the world as a whole. The relative abundance of carbon-12 versus carbon-13 on the earth is C-12 at 98.9% and C-13 at 1.1% respectively, in any naturally occurring sample of carbon. Each of these different carbon isotopes have identical chemical values and have weights that differ by one Dalton. For a molecule containing 100 carbon atoms the probability of there being one C-13 at any one site is 1.1%, the probability of any other site being C-12 or C-13 is unaffected by the selection at any other site. Therefore the probability of there being one single C-13 among the 100 carbon atoms is given by (100*1.1%)=110, meaning that there will be two peaks, the lighter peak having all 100 Carbon-12 atoms, and a second peak that is 11% taller than the first peak and located one m/z unit higher. See foe example FIG. 15. Thus, a compound having a hundred carbon atoms would be likely to have one of the one hundred C-12 atoms replaced by a C-13 atom. As a result of the substitution of one of the one hundred C-12 atoms by a C-13 atom, the MS spectrum of the molecule is likely to have two peaks of roughly equal height separated by one mass unit. The roughly equal height of the two isotope peaks indicates that about half of the individual molecules of this compound have had a random one of the C-12 atoms replaced by a C-13 atom. One peak represents the molecule containing all C-12 atoms, and the second peak at one Dalton higher representing the same chemical molecule, containing C-12 atoms plus one C-13 atom. Further, there will be yet another peak having about 61% of the height of the first peak, in which there will be two random C-12 atoms replaced by C-13 atoms, thus resulting in a mass two Daltons higher than the base isotope molecule. There are further carbon isotope mass spectra peaks representing three Carbon-13 substitutions and having about 22% of the height of the first C-12 peak, and so on. Thus, any compound containing carbon will always produce multiple mass spectra peaks, large organic molecules containing in 80 to 100 carbons will appear as two relatively large peaks separated by one m/z unit, and present automated MS analysis tools may misidentify an isotope peak as a compound of interest. Thus, standard MS analysis has a problem with large organic molecules, because it is difficult to identify or separate the multiple molecular peaks due to various carbon atomic isotopes.
Another problem with analyzing MS data is that the XIC peak found at the expected mass ratio may be a false signal due to background noise. Noise contaminants may be caused by electrical noise in the MS equipment or the GC/LC equipment, or to contaminants in the GC/MS system, or there may be contaminants in the solvent systems used to carry the molecular mixture. There may also be false positive identifications related to the resolution level of the equipment.
Thus, there exists a need in the art for an automated method for analyzing mass spectrometer data which can analyze complex mixtures containing many thousands of components and can correct for background noise, multiply charged peaks and atomic isotope peaks.