This invention relates generally to Mass Spectrographic analysis, and more specifically to the identification of organic compounds in complex mixtures of organic compounds.
Mass spectrometry (MS) is a widely used technique for the identification of molecules, both in organic and inorganic chemistry. MS may be thought of as a weighing machine for molecules. The weight of a molecule is a crucial piece of information in the identification of unknown molecules, or in the identification of a known molecule in a unknown mixture of molecules. Examples of situations in which MS analysis may be used include drug development and manufacture, pollution control analysis, and chemical quality control.
MS is frequently used in conjunction with other analysis tools such as gas chromatography (GC) and liquid chromatography (LC), which help to simplify the analysis of MS spectra by essentially spreading out the timing of the arrival of the individual components of a chemical mixture to the MS system. Thus, the number of different molecular species in the mass spectrometer at any one time is reduced, and separation of mass spectrum peaks is simplified. This procedure works well for chemical samples that contain on the order of 10 to 20 different molecular species, but is inadequate for analyzing samples that contain thousands of different species.
Mass spectrometry operates by first ionizing the chemical material of interest in an ionization source. There are many well known ionization sources in the art, such as electrospray ionization (ESI) and atmospheric pressure chemical ionization (ApCI). The above mentioned ionization methods generally produce what is known in the art as a protonated molecule, meaning the addition of a proton or a hydrogen nucleus, [M+H]+, where M signifies the molecule of interest, and H signifies the hydrogen ion, which is the same as a proton.
Some ionization methods will also produce analogous ions. Analogous ions may arise by the addition of an alkaline metal cation, rather than the proton discussed above. A typical species might be [M+Na]+ or [M+K]+. The analysis of the ionized molecules is similar irrespective of whether one is concerned with a protonated ion as discussed above or dealing with an added alkaline metal cation. The major difference is that the addition of a proton adds one mass unit (typically called one Dalton), for the case of the hydrogen ion (i.e., proton), 23 Daltons in the case of sodium, or 39 Daltons in the case of potassium. These additional weights or masses are simply added to the molecular weight of the molecule of interest and the MS peak occurs at the point for the molecular weight of the molecule of interest plus the weight of the ion that has been added.
These ionization methods can also produce negative ions. The most common molecular signal is the deprotonated molecule [Mxe2x88x92H]xe2x88x92, in this case the mass is one Dalton lower than the molecular weight of the molecule of interest. In addition, some ionization methods will produce multiply charged ions. These are of the general identification type of [M+nH]n+, where small n identifies the number of additional protons that have been added.
The ions produced in any of the ionization methods discussed above are passed through a mass separator, typically a magnetic field, a quadrupole electromagnet, or a time-of-flight mass separator, so that the mass of the ions may be distinguished, as well as the number of ions at each mass level. These mass separated ions go into a detector and the number of ions is recorded. The mass spectrum is usually shown as a chart such as FIG. 1, which illustrates the case of ionized carbon. Note that in this case there are two significant peaks, each representing a different atomic isotope of carbon. In the figure the normalized intensity, or number of ions detected, is displayed on the vertical scale, and the mass to charge ratio (m/z, sometimes also known as Da/e) of the ion is recorded on the horizontal axis. In cases where the charge on the ion of interest is equal to one, as in the case of the singly protonated molecular ions, this mass to charge ratio (m/z) is exactly equal to the mass of the ion of interest plus the mass of the proton.
The situation is not always as simple as that shown in FIG. 1. FIGS. 17a-c show spectra for a single moderate sized organic molecular species containing 1-3 bromine atoms. Even though there is only a single molecular species represented in the spectrum, there are many significant large ion peaks. For example, the peaks at mass 553 indicate the base molecule of interest with all of the carbon atoms being C-12, and all of the bromine atoms being Br-79. The peak at 555 has one Br-79 replaced with the isotope Br-81, and the smaller peak between 553 and 555 is due to one C-12 being replaced by a C-13. The peaks at m/z 556 represent one Br-81 substitution and one C-13 substitution, and so on. In general there will also be lower m/z peaks that represent fragments of the original molecule and various isotope substitutions. Thus any molecule that contains carbon, bromine or a number of other well known elements having isotopes, will always have multiple peaks, making spectrum analysis difficult.
It is often possible to identify the specific molecular species generating, a MS signal by discerning its molecular weight, since different chemicals typically have different molecular weights. MS is a powerful tool in the analysis of unknown pure organic compounds because it can identify the molecular weight or mass of the compound, thus helping to identify the specific compound by limiting the number of possible compounds. MS is a useful tool, but as just demonstrated there are many ways to incorrectly identify a peak, and the analysis can be time consuming and expensive. Furthermore, if the sample of interest contains more than one compound (i.e., it is a mixture of different materials), then the mass spectrum may become even more difficult to interpret. It may not be easy to identify which particular peak in the spectrum corresponds to a specific compound in the sample introduced. Therefore, as was previously noted, to help analyze complex mixtures it is known in the prior art to do some preliminary separation of the mixture prior to introduction into the mass spectrometer by the use of gas chromatography (GC) or liquid chromatography (LC). For example LC/MS (meaning liquid chromatography/mass spectrometry), is frequently. employed in the analysis of drug metabolites in drug discovery laboratories, where it is used to identify which compound has a specific action in living creatures. It is also known to use GC/MS in environmental pollution analysis. This is typically done in cases involving volatile materials, for example dioxins or polychloronated biphenyls. It is possible to identify a specific material of interest, such as dioxin, by looking for the known mass spectrographic characteristic of a dioxin, i.e., its weight, its isotope distribution, and chromatograph retention time. In the above noted examples, the LC and GC methods are used to allow the sample of the unknown mixture of chemicals to enter the mass spectrometer in a known sequence. Preferably only one compound will enter the MS system at a time. By knowing how long it takes the material of interest to move through a gas chromatograph, it is then possible to know at what time the material will enter the mass spectrometer. Looking at the mass spectrometer output during the expected time for dioxin gives a fairly good chance of identifying the dioxin signature without having the signal cluttered by other materials whose mass spectrum may overlap that of dioxin. Thus, it is known in the art to use MS for analyzing sets of chemical compounds with the addition of gas chromatographic or liquid chromatographic separation at the beginning of the Mass Spectrometer. Such systems produce what are known as total ion chromatograms (TICs) which show the number of ions as a function of time. A typical TIC is shown in FIG. 3 for a LC/MS analysis of a mixture containing 5,000 different compounds. There is a signal peak at almost every possible time point and thus analyzing TIC data is difficult because of the large number of data points.
To help solve the data problem, it is known in the prior art to analyze GC/MS or LC/MS spectra by generating what are known as extracted ion chromatagrams (XIC) in which each mass point in the TIC spectrum in the data set is examined over the total sample time for an ion signal which corresponds to the mass of the component of interest. FIG. 4b shows the XIC obtained by plotting the data in the TIC of FIG. 4a for the m/z value 911.5 ion. The XIC contains mass to charge information in addition to the time of arrival. FIG. 4c is an XIC for the m/z range 911.5 to 910.5 ions
These XIC charts are examined for the presence or absence of a peak, thereby either identifying the presence of an ion of interest with the expected mass, or demonstrating the absence of the expected ion. This technique works when examining mixtures of up to 20 different known compounds, but is not well suited to the analysis of hundreds of mixed compounds, because there is a high probability that two or three of those hundreds of mixed components or compounds will have similar chromatographic retention times, and thus arrive roughly simultaneously at the Mass Spectrometer. In a highly complex mixture, there may be multiple materials producing ions at any given m/z values, some or none of which correspond to the compounds of interest. Since both the TIC and XIC are difficult to interpret when examining mixtures of compounds containing hundreds to thousands of molecular species, it is possible to make a three dimensional graph such as FIG. 5, which presents both time and m/z data. FIG. 5 again shows that GC/MS or LC/MS may be useful when examining mixtures having 5 to 10 different compounds, as shown here, but the number of peaks is too high for simple analysis if the number of different compounds exceeds 20 or so.
There exist problems with automated Mass Spectrometer analysis in the art. One such problem is that the software is limited to the specific set of problems for which it is designed. There are no software packages capable of general automated analysis of Mass Spectrographic mixtures of compounds. Problems in automated analysis of complex mixtures include the likelihood that some ions will be observed at almost every m/z ratio (i.e., mass to charge ratio) everywhere within the experimental sample. For example, refer again to FIG. 3, showing a LC/MS chromatogram TIC, showing the number of ions detected versus time from a complex mixture containing roughly five thousand different components. It is clear from FIG. 3 that there is an ion peak at every time point in the range. FIG. 4b is a XIC spectrum that shows that there are positive XIC at m/z ratio 911.5 at many places in the course of the MS run. The large number of peaks is due in part to each compound having multiple peaks as discussed above because of isotopes. There may also be peaks that result from multiply charged components with twice the weight and twice the charge. There may be peaks from various chemical contamination or noise. There may be peaks due to electronic noise or system resolution limits. Thus, automated analysis methods can not find the preprogramed peaks, because it is not clear from the XIC alone whether the signal at the expected m/z ratio of the compound of interest is a real indication of the presence of the expected compound, or whether it is a false signal due to an isotope of a different compound, etc. All of the above noted problems exist in the art of mass spectrographic analysis, whether automated or manual.
To summarize the problems in the art, the isotope pattern problem discussed above typically appears as two or more peaks with slightly different masses, typically one mass unit different. This is due to the fact that most elements in organic synthesis contain carbon. They contain isotopes of carbon in the normal proportion in which carbon isotopes exist in the world as a whole. The relative abundance of carbon-12 versus carbon-13 on the earth is C-12 at 98.9% and C-13 at 1.1% respectively, in any naturally occurring sample of carbon. Each of these different carbon isotopes have identical chemical values and have weights that differ by one Dalton. For a molecule containing 100 carbon atoms the probability of there being one C-13 at any one site is 1.1%, the probability of any other site being C-12 or C-13 is unaffected by the selection at any other site. Therefore the probability of there being one single C-13 among the 100 carbon atoms is given by (100*1.1%)=110, meaning that there will be two peaks, the lighter peak having all 100 C-12 atoms, and a second peak that is 11% taller than the first peak and located one m/z unit higher. See for example FIG. 15. Thus, a compound having a hundred carbon atoms would be likely to have one of the one hundred C-12 atoms replaced by a C-13 atom. As a result of the substitution of one of the one hundred C-12 atoms by a C-13 atom, the MS spectrum of the molecule is likely to have two peaks of roughly equal height separated by one mass unit. The roughly equal height of the two isotope peaks indicates that about half of the individual molecules of this compound have had a random one of the C-12 atoms replaced by a C-13 atom. One peak represents the molecule containing all C-12 atoms, and the second peak at one Dalton higher representing the same chemical molecule, containing C-12 atoms plus one C-13 atom. Further, there will be yet another peak having about 61% of the height of the first peak, in which there will be two random C-12 atoms replaced by C-13 atoms, thus resulting in a mass two Daltons higher than the base isotope molecule. There are further carbon isotope mass spectra peaks representing three C-13 substitutions and having about 22% of the height of the first C-12 peak, and so on. Thus, any compound containing carbon will always produce multiple mass spectra peaks, large organic molecules containing in 80 to 100 carbons will appear as two relatively large peaks separated by one m/z unit, and present automated MS analysis tools may misidentify an isotope peak as a compound of interest. Thus, standard MS analysis has a problem with large organic molecules, because it is difficult to identify or separate the multiple molecular peaks due to various carbon atomic isotopes.
Another problem with analyzing MS data is that the XIC peak found at the expected mass ratio may be a false signal due to background noise. Noise contaminants may be caused by electrical noise in the MS equipment or the GC/LC equipment, or to contaminants in the GC/MS system, or there may be contaminants in the solvent systems used to carry the molecular mixture. There may also be false positive identifications related to the resolution level of the equipment.
Thus, there exists a need in the art for an automated method for analyzing mass spectrometer data which can analyze complex mixtures containing many thousands of components and can correct for background noise, multiply charged peaks and atomic isotope peaks.
The invention resides in a method for analyzing mass spectrometer data in which a control sample measurement is performed providing a background noise check. The peak height and width values at each, m/z ratio as a function of time are stored in a memory. A mass spectrometer operation on a material to be analyzed is performed and the peak height and width values at each m/z ratio versus time are stored in a second memory location. The mass spectrometer operation on the material to be analyzed is repeated a fixed number of times and the stored control sample values at each m/z ratio level at each time increment are subtracted from each corresponding one from the operational runs, thus producing a difference value at each mass ratio for each of the multiple runs at each time increment. If the MS value minus the background noise does not exceed a preset value, the m/z ratio data point is not recorded, thus eliminating background noise, chemical noise and false positive peaks from the mass spectrometer data. The stored data for each of the multiple runs is then compared to a predetermined value at each m/z ratio and the resultant series of peaks, which are now determined to be above the background, is stored in the m/z points in which the peaks are of significance
A technique for automatically analyzing mass spectrographic data from mixtures of chemical compounds has a series of screens designed to eliminate or reduce incorrect peak identifications due to background noise, system resolution, system contamination, multiply charged ions and isotope substitutions. The technique performs a mass spectrum operation on a control sample, producing a first group of output values. Next, perform a mass spectrographic operation on a sample to be analyzed, producing a second group of output values. Select a first m/z ratio for a material expected to be present in the mixture from a predetermined library of calculated mass spectrometer output spectrums and subtract the value of the control sample at the expected output value from the value of the analyzed sample, and compare the difference to a predetermined value. If the value is greater than the predetermined value thus indicating that the signal is above the background noise level, generating a record at that m/z value for an expected material. Performing the same mass spectrum operation several times to eliminate random noise and background contamination. Next, identify peak values that do not have the expected peak width or proper retention time for the separation method. Identify multiply charged ions by examining peak separation. Examine the m/z location of the expected material and compare intensity at the expected m/z location with the intensity at the next lower m/z recorded peak to identify peaks related to atomic isotope substitution. With such a technique, mass spectrograph data analysis may be greatly simplified by the identification of probable spurious signals, and analysis will become simpler and more accurate.
A control sample or reference sample can be a sample against which a series of future MS values or experiments is compared, or the results of an experiment can be treated in effect as a control sample against which one or more subsequent MS values are compared. A peak in one experiment may indicate a desired compound, but such a peak in a series of experiments may suggest a contaminant rather than a desired compound.
In a further embodiment, the MS peaks are then examined by comparison to a library of expected MS output spectrums, by taking an expected m/z ratio from the library of materials thought to exist within the mixture analyzed and comparing to the values found at each m/z ratio. If a signal peak exists in the memory at the m/z ratio corresponding to the value expected for any specific chemical in the library, the data is then examined by checking whether or not the expected m/z ratio has a chromatographic peak temporal position and width that approximates the expected peak of the expected chemical compound. This determines whether or not the peak possibly matches the chemical whose presence is expected in the sample.
In a further embodiment of the invention, the value at the m/z ratio of the expected compound, after being found to be above background and of the approximate peak width expected for the separation method used, is then compared to the value at the peak in the data sample having the next higher m/z ratio. If by taking the two values of m/z ratio, measuring the distance and inverting the value, it is found that if the peak spacing is one full m/z ratio unit, then the ion charge is one. On the other hand, if the second peak is due to a doubly charged ion, then the peaks will be found to be separated by one half of a m/z unit. Similarly, a m/z spacing of one third of a m/z unit indicates a triply charged ion. Thus it is possible to positively identify doubly charged and triply charged ions.
In a further embodiment, eliminating false positive peaks due to atomic isotope substitution is performed by comparing an expected m/z ratio peak, that has been found in the previous tests have reasonable intensity and chromatographic peak width (i.e., to be above the background level), has the expected mass-to-charge (i.e., m/z), and has the correct charge (hence the correct mass), against the next lower m/z ratio peak by subtracting the peak intensity value of the target of interest from the next peak lower in the spectrum by the value equal to 1 divided by the charge of the ion. Thus if the previous test showed that the charge state was 1, then the next lower peak examined. would be one m/z unit lower. If the charge state was found to be 2, then the next lower peak examined would be one half of a m/z unit lower, and so on. A general formula for this relationship is given as peak difference =Imxe2x88x92I(mxe2x88x92(1/z)), where Im is the intensity of the m/z ratio under consideration, m is the m/z value of the signal under consideration, and z is the charge of the ion. The same result may be obtained by simply reversing the order of the direction of peak subtraction and looking for a value that is less than zero. Isotope peaks for most moderate size organic molecules having fewer than about 80 carbon atoms typically decline at higher m/z values. Subtracting the two peak values and getting a negative number indicates that the lighter peak is of higher intensity, thus the peak being examined can be assumed to be an isotope of a lighter molecular species, not a peak of the expected molecular species, and eliminated.
An example of a situation where the invention may be beneficial is found in drug testing. If a chemical is needed to bond to a specific protein, it is possible to fabricate a large number of different small chemicals known as ligands which may bond to protein. The different chemicals may bond to the protein with different strengths. The point of interest is to find the ligand that sticks best. Placing the protein in a bath of perhaps as many as 5,000 possible ligands, (i.e., a library), and then washing the ligands off of the protein will result in a few of the ligands sticking to the protein. Which ligands stick best may be determined by using LC/MS to determine which of the known 5,000 ligands used are found. First the protein is placed in the LC/MS without having been bathed in the ligands and a background value is recorded. This step will be used to eliminate what is known as chemical noise, resulting from protein breakdown products, contaminated solvents and buffers, machine contamination, previous chemicals used in the LC/MS etc, as well as system electronic noise. Next, the protein that has been bathed in the ligands and washed is placed in the LC/MS and the output is compared to the background at each m/z point where one of the 5,000 ligands is calculated to exist. If the expected ligand signal is above the measured background level, a possible hit is recorded. The suspected ligand signal is compared versus the time of arrival at the MS for the expected time for the specific ligand to traverse the LC system.
If the suspected ligand passes the above two tests, then the fact that any molecule containing carbon will have multiple m/z peaks is used, and the suspected ligand m/z peak is compared to the next lower peak and higher m/z peaks. If the peaks are found to be separated by one full m/z unit, then the suspect peak is due to singly charged ions and still may be a possible ligand. If the peak separation is one halt of a unit, then the peak is due to doubly charged ions, and so forth. The doubly charged ion may still be useful, but the correct identification of the ligand responsible will require that the expected mass be calculated differently. The multiple isotope situation also allows the system to determine if the suspect peak is the expected ligand or an isotope peak of some other signal. Again the neighboring peaks are examined, those one m/z unit away in the case of singly ionized molecules and one half of a unit away in the case of doubly charged ions, and the relative sizes of the peaks are compared. For chemicals having fewer than 80 carbon atoms, it is known that the lighter value peaks will be larger than the C-13 substituted peaks, and this fact is used to determine if the suspected is simply a heavier isotope of some other chemical. In this manner the number of peaks that need to be examined by a user is greatly reduced.
Another example of the use of the present invention is found in drug metabolite studies. A potential drug is given to a test animal such as a rat. The user generates a list of possible breakdown products (i.e., metabolites) that may be found in the rat""s blood. A sample of the rat""s blood is taken and examined before the drug is given, thus providing a background level. The blood of rats given the drug is examined for the presence of the suspected metabolites using the method described above of subtracting the background and wrong time of arrival signals, flagging doubly charged ions and ions whose peak heights indicate that isotopes of a different compound may be responsible. In this manner the presence of possible dangerous metabolic byproducts of a drug may be determined.
With such an arrangement, it is possible to automatically reduce the number of MS peaks which need to be examined, by flagging peaks that are due to background noise, isotope substitution, and multiply charged ions. Since it is beneficial to eliminate false peaks from mass spectrographs of complex mixtures in order to enable rapid and accurate analysis of MS spectrums, the present invention solves a known problem in the art of mass spectrometry.