The present invention relates to a method of mass spectrometry and a mass spectrometer. The preferred embodiment relates to a method which allows relative quantitation of analyte compounds especially where incomplete and noisy measurements are made. The preferred embodiment is particularly applicable to the measurement and quantitation of peptide digest products or daughter compound abundances. The preferred embodiment relates to relative Bayesian quantitation of analyte/daughter groups.
As will be discussed in more detail below, the preferred embodiment relates to a probabilistic or Bayesian approach to determining the relative quantitation of a component, molecule or analyte present in two or more samples. By way of background, Bayesian probability theory handles probabilities of statements. Probabilities tell how certain those statements are true. For example, a probability of 1 means that there is absolute certainty. A probability of 0 also means that there is absolute certainty, but absolute certainty that the statement is false. A probability of 0.5 means that there is maximum uncertainty whether the statement is true or false.
Changing probabilities when getting new information is an important aspect of Bayesian reasoning. So called Bayes rule defines how a rational agent changes its beliefs when it gets new information (evidence).
Bayesian probabilities or certainties are always conditional. This means that probabilities are estimated in the context of some background assumptions. Conditional probabilities are usually written using the notation P(Thing|Assumption). The probabilities are numbers between zero and one that tell how certain it is that Thing is true when it is believed that the Assumption is true. Conditional probabilities are often written in the form P(D|M) or P(M|D), where M is dependency model and D is data. Accordingly, P(D|M) means the probability of obtaining data D if it is believed that model M is the true model. Likewise, P(M|D) means the probability that the model M is the true model given the data D. Sometimes probabilities are presented just as P(M) or P(D) but these are generally considered to be imprecise Bayesian notations, since all the probabilities are actually conditional. However, sometimes, when all the terms have the same background assumptions then it may not be necessary to repeat them. In theory, probabilities should be written in the form P(D|M,U) and P(M|D,U) and P(M|U) and P(D|U), where U is a set of background assumptions.
Expert systems often calculate the probabilities of inter-dependent events by giving each parent event a weighting. Bayesian Belief Networks are considered to provide a mathematically correct and therefore more accurate method of measuring the effects of events on each other. The mathematics involved enables calculations to be made in both directions. So it is possible, for example, to find out which event was the most likely cause of another.
The following Product Rule of probability for independent events is well known:p(AB)=p(A)*p(B)where p(AB) means the probability of A and B happening.
This is a special case of the following Product Rule for dependent events, where p(A|B) means the probability of A given that B has already occurred:p(AB)=p(A)*p(B|A)p(AB)=p(B)*p(A|B)So because:p(A)p(B|A)=p(B)p(A|B)Then:p(A|B)=(p(A)*p(B|A))/p(B)
The above equation is a simpler version of Bayes' Theorem. This equation gives the probability of A happening given that B has happened, calculated in terms of other probabilities which are known.
Bayes' theorem can be summarised as:
      P    ⁡          (                        H          0                ❘        E            )        =                    P        ⁡                  (                      E            ❘                          H              0                                )                    ⁢              P        ⁡                  (                      H            0                    )                            P      ⁡              (        E        )            
H0 can be taken to be a hypothesis which may have been developed ab initio or induced from some preceding set of observations, but before the new observation or evidence E. The term P(H0) is called the prior probability of H0. The term P(E|H0) is the conditional probability of seeing the observation E given that the hypothesis H0 is true—as a function of H0 given E, it is called the likelihood function. The term P(E) is called the marginal probability of E and it is a normalizing constant and can be calculated as the sum of all mutually exclusive hypotheses:ΣP(E|Hi)P(Hi)
The term P(H0|E) is called the posterior probability of H0 given E. The scaling factor P(E|H0)/P(E) gives a measure of the impact that the observation has on belief in the hypothesis. If it is unlikely that the observation will be made unless the particular hypothesis being considered is true, then this scaling factor will be large. Multiplying this scaling factor by the prior probability of the hypothesis being correct gives a measure of the posterior probability of the hypothesis being correct given the observation.
The keys to making the inference work is the assigning of the prior probabilities given to the hypothesis and possible alternatives, and the calculation of the conditional probabilities of the observation under different hypotheses.
In the analysis of multiple biological samples or a complex mixture of biological samples it may be desired to compare the relative concentrations of component compounds. For example, it may be desired to see whether or not a protein or peptide is expressed differently in two or more different samples. One sample may, for example, comprise a sample taken from a healthy organism, whilst the other sample may comprise a sample taken from a patient. If a particular protein or peptide is expressed to a significantly greater or lesser extent in the patient sample relative to the sample taken from a healthy organism (i.e. control sample) then this may be indicative of a disease state.
Complex mixtures of biological samples can be analysed using a mass spectrometer preferably in combination with a liquid chromatograph.
It is known to use the ion intensity or ion count rate recorded by a mass spectrometer as a measure of the concentration of each peptide. The data relating to each sample is, however, subject to various systematic errors such as injection volume errors as well as various non-systematic effects such as counting statistics.
Due to the complexity of the samples and the sometimes low concentrations of various components, molecules or analytes in the samples, the data can sometimes or often be incomplete. The data may also include interferences. As a result the assignment of data to components, molecules or analytes or the identification of components, molecules or analytes may be uncertain.
According to conventional approaches these factors can cause results that may appear to be anomalous and hence are thus discarded. As a result, it may not always be possible to quantify some components, molecules or analytes present in two or more samples and/or some data may be rejected out of hand when in fact it may not be anomalous.
It is therefore desired to provide an improved way of being able to quantify components, molecules or analytes present in two or more separate samples when noisy and incomplete measurements of the samples are made.