Mass spectrometry (MS) is an analytical technique which was developed in the last century and which measures the mass-to-charge ratio of charged particles. It is used for determining masses of particles, for determining the elemental composition of a sample or molecule, and for elucidating the chemical structures of molecules, such as peptides and other chemical compounds. The MS principle consists of ionizing chemical compounds to generate charged molecules or molecule fragments and measuring their mass-to-charge ratios. In a typical MS procedure (i) a sample is loaded onto the MS instrument, and undergoes vaporization, (ii) the components of the sample are ionized by one of a variety of methods (e.g., by impacting them with an electron beam), which results in the formation of charged particles (ions), (iii) the ions are separated according to their mass-to-charge ratio in an analyzer by electromagnetic fields, (iv) the ions are detected, usually by a quantitative method, and (v) the ion signal is processed into mass spectra. MS instruments typically comprise three modules: (a) an ion source, which can convert gas phase sample molecules into ions (or, in the case of electrospray ionization, move ions that exist in solution into the gas phase), (b) a mass analyzer, which sorts the ions by their masses by applying electromagnetic fields, and (c) a detector, which measures the value of an indicator quantity and thus provides data for calculating the abundances of each ion present. The MS technique has both qualitative and quantitative uses. These include identifying unknown compounds, determining the isotopic composition of elements in a molecule, and determining the structure of a compound by observing its fragmentation. Other uses include quantifying the amount of a compound in a sample or studying the fundamentals of gas phase ion chemistry (the chemistry of ions and neutrals in a vacuum). MS is now in very common use in analytical laboratories that study physical, chemical, or biological properties of a great variety of compounds.
The sample which is introduced into a mass spectrometer may consist of a multitude of atoms or molecules. During the ionization step, the atoms and molecules may require a charge and may thus be easily handled by electric and magnetic fields. The charged particle is accelerated under the influence of these fields, inversely proportional to its mass-to-charge ratio m/e. A sample may consist of different molecules, with different masses. A mass spectrometer can therefore separate molecules which have masses which differ by the mass resolution width of the mass spectrometer. A typical result of a MS analysis is a mass spectrum which shows peaks at or around certain values for the mass, the peak heights at certain mass values being proportional to the amount of molecules with that mass which were present in the sample. The sample may also comprise the same molecules which are comprised of atoms which have different isotopes. The electrochemical properties of these molecules are the same, but their masses differ with an amount which is close to an integer number of atomic mass units (amu) or Daltons (Da). The different isotopes of a molecule can also be separated in a mass spectrometer since they have a different mass-to-charge ratio. Since a molecule may comprise a multitude of atoms whose isotopes occur with their elemental abundances, each molecule will have an isotopic distribution which in principle consists of a set of peak bunches whose heights depends on the atom content and atomic isotope abundances, these bunches being essentially 1 Da apart. It should be stretched that the isotopic distribution of a molecule is an identifying feature of that molecule: each molecule with a specified atomic content, i.e. with a specified molecular formula, has its characteristic isotopic distribution. Different isomers of a molecule have for all practical purposes the same isotopic distribution. When in a mass spectrum of a sample, e.g. as obtained from a MS analysis, a set of peaks is observed whose relative heights correspond to the relative heights of a molecule's isotopic distribution, one can deduce the presence of this molecule in the original sample. The absolute heights of the peaks are a quantitative measure for the amount of molecules present in the sample.
The isotopic distribution of a molecule can be computed from the molecular formula and the well-known elemental abundances and masses of the constituent atoms. The heights of the peaks in an isotopic distribution represent the probability with which the different molecular isotopes occur in nature. As an example of an isotopic distribution of a molecule, or the isotopic footprint of a molecule, one can look at carbon monoxide or CO. Carbon (C) has two naturally occurring, stable isotopes 12C and 13C with abundances 0.9893 and 0.0107 and with masses 12 Da and 13.0033548378 Da respectively. Oxygen (O) has three naturally occurring, stable isotopes 16O, 17O and 18O with abundances 0.99757, 3.8×10−4 and 2.05×10−3 and with masses 15.99491461956 Da, 16.99913170 Da and 17.9991610 Da respectively. The isotopic distribution comprises 6 peaks distributed in 4 bunches. The first peak is a mono-isotopic peak near mass 27.995 Da with a height of 0.986896001. This height represents the probability with which the molecular isotope of CO with one 12C and one 16O atom occurs in a large sample of CO molecules. The second and third peak of the CO isotopic distribution are grouped around 29 Da near masses 28.998 Da for 13C16O and 28.999 Da for 12C17O with heights 0.010673999 and 0.000375934 respectively. The fourth and fifth peak are grouped around 30 Da near masses 29.999 Da for 12C18O and 30.002 Da for 13C17O with heights 0.002028065 and 4.06600×10−6 respectively. The sixth and last peak is again a mono-isotopic peak near mass 31.003 Da with height 2.1935×10−8. The numbers are summarized in the following table:
Total mass numberMassAbundance2827.9950.9868960012928.9980.0106739992928.9990.0003759343029.9990.0020280653030.0020.0000040663131.0030.000021935
From this table, it is clear that the peaks group in bunches corresponding to the total mass number, i.e. the number of nucleons present in the molecule. Obviously, the mass depends mainly on the specific isotopes of the atoms which make out the molecule. It is also clear from this example that mass spectrometers with a mass resolution which is of the order 0.002 amu or larger, will not be able to distinguish all different isotopic variants within one bunch. Instead, such mass spectrometers will show a mass spectrum with a broadened peak for each bunch whose surface below the peak will be proportional to the summed probabilities of the isotopic variants within that bunch and which will be centered around a center mass which is the weighted average of the masses of the constituting isotopic variants to a bunch, the weighting factors being proportional to the relative abundances or probabilities. This kind of bunched isotopic distribution is called the aggregated isotopic distribution. The aggregated isotopic distribution of a molecule consists of a number of peaks, each peak corresponding to a bunch of the isotopic variants of that molecule with the same total mass number, each peak being located at a center mass which is the average mass of the isotopic variants contributing to the corresponding bunch weighted by their corresponding abundances, and the height of each peaks corresponding to the sum of the abundances of the isotopic variants which contribute to the corresponding bunch.
Although access to high-resolution mass spectrometry (MS), especially in the field of biomolecular MS, is becoming readily available due to recent advances in MS technology, the accompanied information on isotopic distribution in high-resolution spectra is not used at its full potential, mainly because of lack of knowledge and/or awareness. One of the main difficulties when using MS in biomolecular MS, is that the isotopic distribution of polypeptides, which may consist of hundreds amino acids and thousands of atoms, is very hard to compute, let alone to recognize in a MS spectrum. Such a computation may require computing times which are beyond present-day capacities when a straightforward combinatorial approach is used. Furthermore, prior art techniques for identifying a molecule from an isotopic distribution may typically include a trial-and-error or fitting technique which requires a multitude of such computations.
Document U.S. Pat. No. 7,904,253B2 discloses a method for determining elemental composition of ions from mass spectral data, comprising the steps of:                obtaining at least one accurate mass measurement from mass spectral data;        obtaining a search list of candidate elemental compositions whose exact masses fall within a given mass tolerance range from said accurate mass;        reporting a probability measure based on a mass error;        calculating an isotope pattern for each candidate elemental composition from said search list;        constructing a peak component matrix including at least one of said isotope pattern and mass spectral data;        performing a regression against at least one of isotope pattern, mass spectral data, and the peak component matrix;        reporting a second probability measure for at least one candidate elemental composition based on said isotope pattern regression; and        combining the two said probability measures into an overall probability measure through the use of probability multiplications.        
A limiting step in this method is the step where an isotope pattern for each candidate elemental composition from the search list is to be calculated. For very large molecules, e.g. of the order of 10000 Da, this can be a tedious and time-consuming step, even with today's computing technology. Alternatively, the isotope pattern may be obtained from a large database containing the relevant isotope patterns. However, such a database may be larger than can be stored in the e.g. the memory of a computer, especially if the molecules are large, or such a database may be incomplete. Furthermore, such a database will have to have been computed at least once. The present invention provides a method for analyzing at least part of an isotopic distribution of a sample by using i.a. a correct and efficient method for calculating an isotopic pattern of an ion e.g. for use in MS analysis or for storage in a database. More specifically, the present invention offers a method for analyzing at least part of an isotopic distribution by using a fast and efficient method for computing the aggregated isotopic distribution of a molecule. It can do this in a recursive way. The method starts by computing the center mass and probability of a starting aggregated isotopic variant of a molecule which e.g. may be expected to be present in the sample. Preferably, this starting aggregated isotopic variant is a mono-isotopic variant, e.g. the lightest isotopic variant, which means that the center mass is simply the mass of the isotopic variant, and the probability is the product of the elemental abundances of the constituent atoms. From the knowledge of the probability of e.g. the lightest aggregated isotopic variant, the probability of a next aggregated isotopic variant with total mass number differing from the total mass number of the starting aggregated isotopic variant, e.g. the second lightest, can be computed. From the knowledge of the probabilities of the starting and next aggregated isotopic variant, the probability of a third aggregated isotopic variant can be computed, etc. Furthermore, the present invention also offers the possibility of computing the center mass of each aggregated isotopic variant of a molecule in a similar recursive way.
Documents Rockwood '95 (Rockwood, Alan L., Rapid Commun. Mass Spectrom. 9:103-105, 1995) and Rockwood '96 (Rockwood, Alan L., Van Orden, Steven L., Anal. Chem. 68:2027-2030, 1996) disclose a method of computing the aggregated isotopic of a molecule by casting the problem in terms of Fourier transforms. Because discrete Fourier transforms can be calculated very efficiently, this way of looking at the problem has significant practical implications. Specifically, the documents disclose an ultrahigh-speed algorithm for calculating isotope distributions from molecular formulas, elemental isotopic masses, and elemental isotopic abundances. For a given set of input data (molecular formula, elemental isotopic masses, and elemental isotopic abundances), and assuming round-off error to be negligible, the algorithm rigorously produces isotope distributions whose mean and standard deviation are “correct” in the sense that an error-free algorithm would produce a distribution having the same mean and standard deviation. The peak heights are also “correct” in the sense that the height of each nominal isotope peak from the ultrahigh-speed calculation equals the integrated peak area of the corresponding nominal isotope peak from an exact calculation. As a consequence of these properties, the algorithm generally places isotope peaks within millidaltons of their true centroids or center masses. The method uses Fourier transform methods.
Although the method introduced in Rockwood '95 and '96 is fast in computing isotopic and aggregated isotopic distributions, it may still be computationally intensive. Furthermore, this method cannot be inversed directly, i.e. when an isotopic or aggregated isotopic distribution of a molecule is presented, the Fourier Transform technique is not able to deduce the molecular formula directly: it can only deduce it through a trial-and-error or fitting technique such as the one of document U.S. Pat. No. 7,904,253B2 described here above. It is the aim of this invention to provide a method which is computationally less intensive than the Fourier transform method. Furthermore, it is the aim of this invention to provide a method which is directly invertible, i.e. which is able to deduce the molecular formula directly from a aggregated isotopic distribution without having to turn to a computationally involved and time-consuming trial-and-error method.
The prior art methods for computing the aggregated isotopic of a molecule are not fast or not accurate enough, too much memory is needed, too many details are calculated, computational problems such as numerical overflow and numerical inaccuracies can occur, etc., and this especially for large molecules such as polypeptides or oligonucleotides (DNA, RNA).
There remains a need in the art for an improved method for analyzing at least part of an isotopic distribution of a sample by computing the aggregated isotope distribution of a molecule in a more efficient, stable, computationally less intensive manner, and for an improved method for analyzing at least part of an isotopic distribution of a sample whereby the center mass of an aggregated isotopic variant is computed in an improved way. The present invention provides such methods, whereby prior art problems are overcome due to a method for computing the aggregated isotopic variant probability from previously computed or known aggregated isotopic variant probabilities. Furthermore, the center masses are computed along the same lines and do not involve a lot of extra computing time.
The present invention also provides a method for identifying the elemental composition of a molecule in a sample by inverting the presented steps for computing an aggregated isotopic distribution of a molecule, without the need of a trial-and-error or fitting technique, as is commonly used in the prior art. The present invention also provides a method for identifying and quantifying the presence of elements in a molecule, which do not alter the isotopic distribution of the molecule whilst they contribute to the mass of the molecule. Phosphor is such an element.
The present invention thus provides a transformation which calculates the aggregated isotopic distribution and exact center masses based on the elemental composition of a molecule and a method to estimate the elemental composition based on the observed aggregated isotopic distribution in a mass spectrum, by reversing said transformation. The invention claims that above tools are being used to screen for, e.g., phosphorylated peptides, or any mono-isotopic elements in a molecule. This is achieved by estimating the elemental composition from the observed aggregated isotopic distribution of a molecule. The estimated elemental composition is used to calculate the mono-isotopic mass of the observed molecule. The latter step is an addition of the mono-isotopic elemental masses. The calculated mono-isotopic mass based on the estimated elemental composition can be compared with the observed mono-isotopic mass in the mass spectrum, e.g. if the mass difference is around 31 Da, one may conclude that the observed isotope pattern is originating from a phosphorylated molecule.
Prior art techniques for identifying the elemental composition of and/or quantifying the presence of mono-isotopic elements in a molecule in a sample may involve expensive and time-consuming extra experimental steps, such as done in tandem-MS. These extra experimental steps lead to longer measuring times, extra cost, bigger equipment, etc. A trial-and-error or fitting technique is time-ineffective, especially for large molecules, for identifying the molecule from an isotope distribution. U.S. Pat. No. 7,904,253, for instance, discloses a typical trial-and-error technique for identifying the chemical formula of a molecule whose isotopic distribution has been measured in a small mass window.
The present invention aims to resolve at least some of the problems mentioned above.
There remains a need in the art for an improved method for estimating compositional information of a molecule from an isotopic distribution. The present invention provides such a method by inversing the above mentioned method. Inverting the method enables the estimation of the chemical formula of a molecule, i.e. the number of atoms of each species from an isotopic distribution. Furthermore, the present method does not need to know absolute masses, but can deal with a part of an isotopic distribution which shows only the peaks of the aggregated isotopic variants of a molecule, i.e. given the heights of a set of peaks essentially 1 Da apart, the present method is capable of computing the estimated compositional information of the molecule responsible for these peaks where atomic elements are concerned which are non-mono-isotopic in nature. For identifying mono-isotopic elements in the chemical formula of a molecule, one can use the value of the lowest, necessarily mono-isotopic mass of the molecule as derived from the estimated compositional information of the molecule and compare this with the measured value for the mass of the first peak. The difference between the observed mass in the mass spectrum and the calculated mass using the estimated compositional information gives the summed mass of all mono-isotopic elements present in the molecule which is observed.
The identification is especially hard for large molecules. Proteins and polypeptides are such large molecules, e.g. in the range of 50000 Da. A pivotal modification of polypeptides regulating many cellular activities and functions is polypeptide phosphorylation. The present invention provides a method for identifying and detecting the presence and amount of a phosphorylated polypeptide from a mass spectrometric analysis, based on the methods discussed above. Phosphorus P is a mono-isotopic element, i.e. it only has one isotope which is stable or at least stable enough to be found in naturally occurring substances.