The mass spectrometric analysis of molecules is complicated by the presence of many different molecules having closely similar mass to charge ratios. Fragmentation techniques have been developed to help identify the different parent molecules by measuring the mass to charge ratios of their characteristic fragments. Ions of a molecule of interest are mass-to-charge selected by a mass selective ion optical device, along with other molecular ions of a closely similar mass-to-charge ratio. These ions are called the parent or precursor ions. These parent ions are then fragmented using one or more processes, and the fragment ions are mass analysed—providing a so-called MS/MS mass spectrum. Molecules of different structure typically fragment to form different fragment ions, and the parent molecules can be identified by studying the mass to charge ratios of those fragment ions. Where the fragment mass spectra also contain interferences, or where a higher amount of information than is present in MS/MS is required, further stages of fragmentation may be used, producing MS^n mass spectra. Libraries of protein sequences have been developed and these are searched, using algorithms developed for the purpose, to match the fragment ion spectra to likely parent molecules.
This is a powerful and widely-used method in organic mass spectrometry. However it has certain disadvantages, relating to the requirement for more than one mass selective step. This requirement increases the complexity of the instrumentation required to perform the method, and increases the time of analysis.
Besides using the technique of ion fragmentation to enable a parent molecular ion to be identified, a high mass resolution mass spectrometer may be used to distinguish between molecular ions of closely similar mass to charge ratios. However, typically such high mass resolution spectrometers are more costly and often very much slower (due to longer measurement times) than their lower resolution counterparts.
If the fragment ion mass spectra are of high resolution and high mass accuracy, the match between the fragment ions and likely parent molecules can be made with a higher degree of confidence. Consequently in order to identify large molecular ions most effectively, analysts often use a combination of high resolution mass spectrometry and fragmentation methods. However combining the two methods results in an even longer analysis time.
Methods such as those outlined above are routinely used for samples containing proteins. Typically the proteins are digested to produce peptides and these are ionised and introduced into the mass spectrometer.
The target protein or mixture (for example a cell lysate) is pre-processed. Pre-processing can include filtering or cleaning. It is then digested with a suitable cleavage reagent. The most frequently used is the enzyme trypsin, but others, like Chymotrypsin, Cyanogen bromide, iodoso benzoate are also used. After digest and possibly cleaning the mixture is fed to a mass spectrometer, usually following chromatographic separation. Chromatographic separation usually limits the time available for the tandem mass spectrometry process. Chromatography times per peak range from 30 seconds to less than 1 second with the trend being to faster times.
Initially a full mass spectrum is taken, producing a so-called precursor ion spectrum. Fragment ion spectra can be obtained for every ion species in the precursor ion spectrum (data-independent MS/MS). Alternatively, a frequently used approach is “data dependent” MS/MS. In this method, a full spectrum is acquired and afterwards the one or more most intense peaks are selected, usually automatically, and subjected to MS/MS fragmentation, one by one. The precursor and fragment spectra are stored. Various enhancements to this include: temporary blacklisting of precursors to avoid re-measurement of intense ions; permanent blacklisting of precursors to avoid collection of MS/MS data of well known peptides or solvent components; whitelisting of masses of interest to allow fragmentation even when the most intense criteria are not met. However, there are two problems with this data-dependant MS/MS approach. Firstly, different runs of the same sample may produce very different results, because, for example, even small variations in peak heights in the precursor ion spectra may result in different decisions being automatically made, leading to the selection of different precursor ion species for fragmentation. Secondly, in many cases there may not sufficient time to fragment all ions of interest within the time window available due to the preceding chromatographic process.
The prior art data-dependant process in which two precursor ions are selected for MS/MS is shown as an example in the flow chart of FIG. 1.
After, or sometimes during measurement, the acquired data are evaluated. Many methods are known for this, such as (1) “de novo sequencing” in which the amino acid sequence is inferred directly from the spectra; (2) “sequence tagging” in which only part of the amino acid sequence is directly inferred from the spectra, and these small sections (“tags”) are used in a database search routine; (3) a direct database search is performed just using the fragment ion spectra.
Database searching is performed to match fragments ions to their likely peptide precursors. Automatic routines have been developed to perform the searches. The result is a list of likely precursors with a score denoting the confidence in the match. Optionally the database to be searched can be pre-selected by the user who can limit the search to peptide precursors known to be relevant, such as, for example, those for yeasts where the sample is known to have originated from a yeast. Optionally the computer search can also provide protein scores calculated from the peptide scores to give an indication of the likely proteins contained in the pre-digested sample. Typically the search algorithm returns a score-sorted list of the protein or peptide candidates along with their scores. The interpretation is then typically left to the user.
The standard approach is to submit a peak list of each of the MS/MS spectra together with the respective precursor mass (usually this is the mass that triggered the MS/MS event in the data dependent setup) to a “search engine” for comparison with a database. Normally a check for more than one precursor in the mass selection window is not done. Many databases of proteins are publicly available. Some of them directly contain proteins from previous analysis, others, such as SwissProt (http://expasy.org/sprot/), are computer translations of genomic sequences.
As the final goal of search engine use is to come up with one or more proteins determined to be in the analyte mixture, the proteins in a database are “electronically digested” to peptides with properties matching the cleavage reagent selected by the user. This “in silico digestion” can happen on the fly or as an “indexing” step before the actual search is performed. All peptides matching the precursor mass within a tolerance window defined by the user or inferred from the data are considered “candidates”. Fragment ions from these candidates are then predicted. Scores are associated with these candidates based on the MS/MS data, a higher score resulting when the MS/MS fragment ion spectrum contains the predicted fragments of the predicted candidates.
The prior art database search process is shown as an example in the flow chart of FIG. 2.
If deliberately or inadvertently more than one precursor ion species is selected at the same time for fragmentation, the fragment ion spectrum will be more complex and the results from the database search engine will be less accurate.
The prior art processes described in FIGS. 1 and 2 suffer from the disadvantage that the time to obtain the score-sorted list of likely peptides is slow, even where these data-dependant methods are used, because each precursor ion of interest alone must be selected and individually fragmented, and the resultant ions mass analysed sequentially, before they can be processed using standard search engine techniques. This is costly as instrument time is expensive, and it is wasteful as relatively large proportions of the sample (which may only exist in very small quantities) are consumed during the process.
One particular method of improving the throughput is described by Masselon and Smith in Analytical Chemistry, Volume 72, No. 8, pp 1918-1924, 2000. In this method a form of multiplexing is performed. Fragment ions from more than one precursor are intentionally measured in a single mass spectrum taken with very high mass accuracy. The fragment ion spectrum does then contain fragments from more than one precursor ion species. This spectrum is sent to the database search engine as normal, and the method relies on the high mass accuracy of the fragment spectrum which enables most of the fragment ions to be attributed to a specific parent polypeptide, though possibly not every fragment ion species can be assigned to a parent.
There are several disadvantages to the method of Masselon and Smith. As noted above, when fragment ion spectra from more than one precursor ion species are processed by the standard search engine methods, because the fragment ion spectra are more complex, the results from the database search engine are less accurate, even though high mass accuracy has been used. Furthermore, not only are the scores less accurate, a far greater number of false-positive identifications will result. Due to the complexity of the fragment ion spectra, the speed of the search engine is greatly reduced.
The present invention seeks to address these and other problems with prior art MS/MS data processing.