Mass spectrometry has advanced over the last few decades to the point where it is one of the most broadly applicable analytical tools for detection and characterization of a wide class of molecules. Mass spectrometric analysis is applicable to almost any species capable of forming an ion in the gas phase, and, therefore, provides perhaps the most universally applicable method of quantitative analysis. In addition, mass spectrometry is a highly selective technique especially well suited for the analysis of complex mixtures of different compounds in varying concentrations. Mass spectrometric methods provide very high detection sensitivities, approaching tenths of parts per trillion for some species. As a result of these beneficial attributes, a great deal of attention has been directed over the last several decades at developing mass spectrometric methods for analyzing complex mixtures of biomolecules, such as peptides, proteins, carbohydrates and oligonucleotides and complexes of these molecules.
One common type of application of mass spectrometry to analysis of natural samples involves the characterization and/or quantification of components of complex mixtures of biomolecules. Many such biological molecules of interest are biopolymers, such as polynucleotides (RNA and DNA) polypeptides and polysaccharides. Generally, the chemical composition (related to the specific collection of monomers of which the polymer is comprised) and the sequence of monomers are the distinguishing analytical characteristics of biopolymer molecules of a given class. However, since biopolymer molecules of a given class generally have high molecular weights and can generate ions having a wide range of charge states, distinguishing various molecules within a mixture of such molecules by mass spectrometry can be challenging.
One important application of mass spectrometry analysis of biopolymers occurs in the field of protein studies (proteomics). In such studies, two types of protein sequencing methods have become popular: (1) the so-called “bottom-up” approach and (2) the so-called “top-down” approach. In the top-down method intact proteins are ionized and directly sampled by the mass spectrometer and then fragmented during MS/MS analysis. Performing mass spectrometric analyses using such an approach can be challenging for the reasons stated above. In the alternative bottom-up approach, a protein-containing sample is digested with a proteolytic enzyme resulting in a complex mixture of peptides, which may be considered to be oligomers. Next, the digested sample is chromatographically separated (in one or multiple dimensions) such that the digest components elute at various times according to their column retention times (RTs). The various eluting components are then introduced to an ion source, usually an electrospray ionization (ESI) source, on a mass spectrometer. The ESI source converts condensed phase ions, eluting from the HPLC column, to multiply-protonated molecules (cations) in the gas-phase. The mass spectrometer then detects the ions and identifies the various peptides using, generally, the technique of tandem mass spectrometry, which is sometimes referred to as “MS/MS” spectrometry or “selected-reaction monitoring” (SRM) and is discussed in greater detail below. In a typical “shotgun proteomics” experiment a cell lysate or other sample, containing as many as several thousand proteins, is analyzed using the bottom-up approach.
During tandem mass spectrometry operation, various precursor ion types that have been chosen to represent respective peptides are isolated. The isolated precursor ions are then subjected to fragmentation (e.g., in a collision cell), and the resulting fragment (product) ions are transported for analysis in a second stage of mass analysis or a second mass analyzer. The method can be extended to provide fragmentation of a selected fragment, and so on, with analysis of the resulting fragments for each generation. This is typically referred to an MSn spectrometry, with n indicating the number of steps of mass analysis and the number of generations of ions. Accordingly, MS2 mass analysis (also known as an MS/MS mass analysis) corresponds to two stages of mass analysis with two generations of ions analyzed (precursor and products). A resulting product spectrum exhibits a set of fragmentation peaks (a fragment set) which, in many instances, may be used as a fingerprint to identify the peptide from which the particular precursor and product ions were derived.
Although a single SRM transition can be used to successfully identify a particular peptide, in order to identify each of the various proteins from which the peptides were formed (during the digestion step), generally more than one diagnostic peptide is required. In particular, a certain number, Q, of peptide identifications is considered to be necessary in order to confidently infer the presence of a particular protein in the original sample the possibility exists that any given peptide may be generated in the digest from more than one protein. Using more than one peptide of the digest as a marker for a given protein provides redundancy in case the same identical peptide should, by chance, be formed in the trypsin digestion of more than one protein. Conventionally, three peptides are considered adequate to infer the presence of a particular protein (that is, Q=3).
Because the various peptides generated in a tryptic digest will elute at various times during bottom-up proteomics experiments, the mass spectrometer system should be programmed so as to search for the various diagnostic ions at appropriate times during the course of the chromatographic elutions. Unfortunately, however, one often encounters a problem in scheduling SRMs or targeted MS/MS acquisitions based upon the expected retention times. Similar scheduling problems occur, in general, in various situations in which the demand for a resource is not equally distributed over time. Surges in demand create a problem when there is a ceiling on the maximum amount of resource that can be delivered per unit time. A common example of this is power usage in the afternoon on a hot, summer day. In the case of power distribution or in many other cases where consumers pay for a utility or a good, the free market can provide a solution by assigning a higher price to consumption during periods of high demand. This has the effect of encouraging some fraction of price-sensitive users to reschedule their usage to periods of lower demand, thus leveling out the overall demand for the good.
As a general rule, the distribution of retention times is approximately Gaussian, with a peak density in the center and much lower density in the tails. The shape of this distribution is fundamental because the retention time of a peptide can be accurately approximated as the sum of the retention times of its constituent amino acid residues. As a result, the distribution of retention times of randomly generated peptides obeys the Central Limit Theorem of statistics. The Central Limit Theorem states that the sum of independent, identically distributed random variables tends to a Gaussian distribution as the number of terms in the sum increases. Peptides with more than 5 or 6 residues, as are commonly encountered in proteomics experiments, produce retention time distribution that follow the expected Gaussian distribution.
The phenomena which give rise to the SRM scheduling problem discussed above are schematically illustrated in FIGS. 1-2. Curve 10 in the lower portion of FIG. 1 represents a hypothetical chromatogram (detected ion intensity plotted against retention time) showing the elution of numerous peptides—each corresponding to a peak in the chromatogram—during the course of a single experimental run. For illustration purposes only, it is assumed that the chromatogram 10 includes a total of 170 separate elution peaks. For convenience, each peptide may be referred to by a numerical index, k, where 1≦k≦170 and where, in this example, the index k is assigned in order of elution. The elution periods are indicated for a subset of the various peptides by horizontal lines in the upper portion of FIG. 1. For example, the horizontal bar k5 indicates the elution of the fifth peptide (i.e., the peptide for which k=5). Likewise, the horizontal bars k10, k20, k30, k50, k60, k100 and k120 indicate the elution of the peptides for which k=10, k=20, k=30, k=50, k=60, k=100 and k=120, respectively. Note that the index k is plotted along the vertical axis of the upper portion of FIG. 1. The small vertical bars at the end of each horizontal bar indicate the respective elution start and elution end times for the respective peptide. For clarity, the elution periods corresponding to other peptides are not specifically indicated but may be assumed to follow the general trend shown in the upper portion of FIG. 1.
As a general rule, not all of the chromatographic peaks of the chromatogram 10 may be fully resolved because of overlap of some closely spaced peaks. The lower portion of FIG. 1 illustrates that the density of peaks is generally greater in the center of the run because of the adherence to the Central Limit Theorem as noted above. The central region of greater peak density gives rise to greater peak overlap in this region, relative to the beginning and ending portions of the experimental run, as is schematically illustrated in the upper portion of FIG. 1.
FIG. 2A schematically illustrates the expected general form of a histogram of the number of peptide peaks per each segment of the total chromatographic run of a protein tryptic digest, if one were to partition the total run time into equal time segments and count the number of eluting peptide peaks in each segment. For example, the vertical bars centered at retention times t1-t10 in FIG. 2A represent the hypothetical distribution of peak counts per partition if one were to partition a chromatogram such as the one illustrated in the bottom portion of FIG. 1A into ten equal-width time segments. According to the Central Limit Theorem analysis, the form of such a histogram should approach the form of a Gaussian probability density distribution, shown as curve 80, as the total number of peptides increases and the partition width decreases. The peptide selection probability density at any time point may be defined as the probability per unit time of selecting a peptide within a time partition that includes the time point. If one were to select peptides to be detected uniformly at random from such a Gaussian distribution, i.e. without taking into account the predicted retention times, the distribution of retention times of the selected peptides would be essentially the same as the underlying distribution, i.e. Gaussian. In this case, one would encounter the disadvantage of having significantly fewer SRMs or MS/MS events per unit time at the beginning and the end of the experimental run and many more in the middle of the run, resulting in a suboptimal, inefficient utilization of the mass analyzer, and possibly in undesirable results. In some cases, this is unavoidable. The inventors have realized, however, that in many other cases, there is freedom in experimental design that allows one to distribute the demand for SRMs or MS/MS events evenly over a chromatographic run.