Structural elucidation of ionized molecules of complex structure, such as proteins, is often carried out using a tandem mass spectrometer that is coupled to a liquid chromatograph. The general technique of conducting mass spectrometry (MS) analysis of ions generated from compounds separated by liquid chromatography (LC) may be referred to as “LC-MS”. If the mass spectrometry analysis is conducted as tandem mass spectrometry (MS/MS), then the above-described procedure may be referred to as “LC-MS/MS”. In conventional LC-MS/MS experiments a sample is initially analyzed by mass spectrometry to determine mass-to-charge ratios (m/z) corresponding to the peaks of interest. The sample is then analyzed further by performing product ion MS/MS scans on the selected peak(s). Specifically, in a first stage of analysis, frequently referred to as “MS1”, a full-scan mass spectrum, comprising an initial survey scan, is obtained. This full-scan spectrum is the followed by the selection (from the results obtained) of one or more precursor ion species. The precursor ions of the selected species are subjected to ion activation (generally, a deposition of energy) followed by one or more reactions, such as fragmentation, such as may be accomplished employing a collision cell or employing another form of fragmentation cell such as those employing surface-induced dissociation, electron-transfer dissociation or photon dissociation. In a second stage, the resulting fragment (product) ions are detected for further analysis (frequently referred to as either “MS/MS” or “MS2”) using either the same or a second mass analyzer. A resulting product spectrum exhibits a set of fragmentation peaks (a fragment set) which, in many instances, may be used as a means to derive structural information relating to the precursor peptide or protein or other biochemical oligomer. It should be noted that, using the fragment ions as a starting population, the process of ion selection and subsequent fragmentation may be repeated yet again, thereby yielding an “MS3” spectrum. In the general case, a mass spectrum obtained after (n−1) iterated stages of selection and fragmentation may be referred to as an “MSn” spectrum. This is a time-consuming process because the sample needs to be mass analyzed at least twice and the MS/MS data is only recorded for a limited number of components.
Most presently available mass spectrometers capable of tandem analysis are equipped with an automatic data-dependent function whereby, when selecting the precursor ion for MS2 analysis from the ion peaks in MS1, the ion precursors are selected in decreasing intensities. In a simple data-dependent experiment shown in FIG. 1A, a detector continuously measures total current attributable to ions entering a mass spectrometer detector. A threshold intensity level 8 of the total ion current is set below which only MS1 data is acquired. As a first component—detected as peak 10—elutes, the total ion current intensity crosses the threshold 8 at time t1. When this occurs, an on-board processor or other controller of the mass spectrometer determines the most intense ion in the MS1 spectra and immediately initiates an MS/MS scan with regard to the most intense ion. Subsequently, the leading edge of another elution peak 12 is detected. When the total ion current once again breaches the threshold intensity 8 at time t3, an MS/MS scan is initiated with regard to the most intense ion detected after time t3. Generally, the peak 12 will correspond to the elution of a different chemical component and, thus, the most abundant ion detected after time t3 will be different from the ion for which MS/MS analysis was conducted during the elution peak 10. In this way, both MS and MS/MS spectra are acquired on each component as it elutes.
The simple data dependent experiment described above works well with chromatographically resolved or partially resolved components, as are illustrated in FIG. 1A. However, in a very complex mixture there may be components whose elution peaks completely overlap, as illustrated in the graph of ion current intensity versus retention time in FIG. 1B. In this example elution peak 11 represents the ion current attributable to ion m11, and elution peak 13 represents the ion current attributable to ion m13, the masses of these ions being schematically illustrated in the mass spectrum representation in inset box 16. In the hypothetical situation shown in FIG. 1B, there is almost perfect overlap of the elution of the compounds that give rise to ions m11 and m13, with the mass spectral intensity of ion m11 always being greater than that of ion m13 during the course of the elution. Under these conditions, the simple data-dependent technique discussed above with reference to FIG. 1A will fail to ever initiate MS/MS analysis of ion m13 (and possibly other important ions), since only the most intense component (m11) will be selected for MS/MS.
The hypothetical two-ion situation illustrated in FIG. 1B is a simplified example. Most modern mass spectrometer instruments are capable of performing a series of MS/MS analyses with regard to each respective one of several abundant ions detected in an MS 1 analysis. Typically, instead of choosing just a single most-abundant precursor, modern instruments will select the “top P number of the most abundant precursors” for tandem mass analysis based on the information of a preceding MS1 data acquisition, where the number P is either a constant or perhaps a variable input by a user. Nonetheless, the basic issue demonstrated by FIG. 1B remains, especially for multicomponent samples of biopolymer analytes which may give rise to tens to hundreds of mass spectral peaks in a single mass spectrum. Regardless of how such a sample is introduced into a mass spectrometer (for example, by chromatographic separation, flow injection, or capillary electrophoresis; as a chemical separate delivered from a lab-on-a-chip device, by infusion or other method), more than one analyte may be represented in a single mass spectrum from a single time point, and each such analyte may give rise to many ions, as illustrated in hypothetical mass spectrum illustrated in FIG. 1C. In FIG. 1C, solid vertical lines outlined by envelope 208 represent centroids of a first set of mass spectral peaks generated from a first analyte compound and dotted vertical lines outlined by envelope 206 represent centroids of a second set of mass spectral peaks generated from a second co-eluting analyte compound. It is evident that, even if the number, P, of most-abundant peaks to be analyzed is equal to 10, for example, than only the ions of only one of the analyte compounds will be selected for MS/MS analysis using the traditional data dependent methods described above. Information relating to the second analyte will be lost. Further, the data so obtained will comprise redundant information on the same component.
To more successfully address the complexities of mass spectral analysis of co-eluting compounds, many mass spectral instruments also employ the so-called “Dynamic Exclusion” principle by which a mass-to-charge ratio is temporarily put into an exclusion list after its MSn spectrum is acquired. The excluded mass-to-charge ratio is not analyzed by MSn again until a certain time duration has elapsed after the prior MSn spectrum acquisition. This technique minimizes a chance of fragmenting the same precursor ion in several subsequent scans, and allows a mass spectrometer to collect MSn spectra on other components having less intense peaks which would otherwise not be examined. After a selected period of time the excluded ion will be removed from the list so that any other compounds with the same mass-to-charge ratio can be analyzed. This time duration during which the ion species is on the exclusion list is generally estimated based on an average or estimated chromatographic peak width. Thus, use of the Dynamic Exclusion principle allows more data to be obtained on more components in complex mixtures.
Unfortunately, existing dynamic exclusion techniques may perform poorly for analyzing mass spectra of mixtures of complex biomolecules. For example, consider once again the hypothetical situation illustrated in FIG. 1C. If the ions depicted in FIG. 1C are analyzed using the dynamic exclusion principle, then at least 10 ion species derived from a single analyte (outlined by envelope 208) will be analyzed, in decreasing order of their intensities in the illustrated MS1 spectrum, by MSn analysis prior to any peaks from the less abundant analyte (outlined by envelope 206) being considered. This sequence will occur regardless of the fact that each precursor each ions species is placed onto an exclusion list after its respective analysis. The amount of time consumed performing ten unnecessarily redundant MSn analyses may then lead to expiration of the exclusion time of the most abundant ion (or may lead to exhaustion of the time available to fully analyze a small number of most abundant ions), after which the entire sequence may of MSn analyses may be repeated.
A further complicating factor in the application of the dynamic exclusion principle to mass analysis of mixtures of complex biomolecules derives from the fact that the elution profiles of the various compounds are highly variable and difficult to predict. Different biopolymer compounds may exhibit different elution profiles as a result of complex interactions between a chromatographic stationary phase and a biopolymer with multiple molecular interaction sites. Moreover, the time profiles of various ions generated from even a single such compound may fail to correlate with the elution profile of the un-ionized compound or with the profiles of one another as a result of ionization suppression within an ionization source of a mass spectrometer.
As an example of the elution profile variability that may be encountered, FIG. 2 illustrates a set of chromatograms collected from a single liquid chromatography-mass spectrometry experimental run of an E. Coli extract. Total ion current is shown in the topmost chromatogram (curve 40) and various extracted ion chromatograms, illustrating the ion current that is contributed by respective m/z-ratio ranges are shown in the lowermost five plots (curves 50, 60, 70, 80 and 90). Curve 50 represents the m/z range 660.0-660.5 Da. Similarly, curves 60, 70, 80 and 90 represent m/z ranges 700.5-701.5 Da, 1114.5-1114.5 Da, 942.5-943.5 Da and 540.5-540.5 Da. Peaks 1, 2 and 3 are examples of peaks with broad chromatographic profiles. Peaks 4 and 5 are examples of narrow profiles. Peak 6 shows an extremely broad peak. The peak widths span over an order of magnitude, therefore severely limiting the applicability of an exclusion list having a pre-defined exclusion time duration.
The existing data dependent and dynamic exclusion workflow techniques and corresponding algorithms were developed for small molecules, small peptides and other analytes which acquire a limited number of charges (for example, 1-3 charges) in the electrospray ionization process. When applied to higher-molecular-weight biopolymer analytes (most commonly, intact proteins during the course of so-called “top-down” proteomics studies) these conventional methodologies significantly under-perform due to a combination of different electrospray behavior and computational limitations. More specifically: (1) intact high mass analytes in general, and proteins in particular, develop many more charge states (up to 50 charges or more per molecule, e.g., FIG. 1C) than do small molecules during the electrospray ionization process because of a greater number of charge acquiring sites which results in much more complex MS spectra; (2) in complex mixtures such as cell lysates or their fractions, there is a wide distribution of molecular weights and copy numbers which results in a very complex overlap of charge state distribution patterns of varying intensities; (3) variability in physiochemical properties of the high-mass analytes of the same or different chemical nature results in significant variability of chromatographic peak shapes and analyte retention on the column; (4) if the mass spectra are acquired on a mass spectrometer with high resolving power such as an Orbitrap™ mass analyzer (a type of electrostatic trap mass analyzer) or a time-of-flight (TOF) mass analyzer, corresponding peaks further resolve into a number of isotopes in a series of clusters whose quality is often far from a theoretical binomial distribution; (5) matrix ionization effects of a variety of different proteins can greatly influence the observed intensity of multiply overlapping species so as to distort the true ratios of protein intensities found in any given standard or sample. These factors make it difficult to estimate a time for placing analyte-specific m/z values on a dynamic exclusion list. Additional levels of complexity are introduced by oxidized species of the same analyte or adducts, overlaps of isotope clusters and inability of existing software tools correctly calculate charge state for high mass species.
It is not uncommon for a single protein to generate greater than hundreds of resolved peaks (including both charge states and isotopes) per MS mass spectrum on high resolution/mass accuracy instruments. In practical terms, the above considerations imply that, in the case of intact proteins and other biopolymers, existing data dependent algorithms are being confounded and MS/MS is being performed in a redundant fashion on a number of different charge states from the same biopolymer. Also, when isotopic clusters do not match the traditional binomial distribution patterns defined by the number of carbon, hydrogen, nitrogen, oxygen, nitrogen and sulfur atoms present in a given biopolymer, or do not meet intensity threshold or signal-to-noise requirements, redundancy occurs from fragmenting multiple isotopes which belong to the same isotopic cluster. This duplication of work leads to redundancy in identification of the most abundant/ionizable proteins, while the information about other species is lost and provides very little opportunity for triggering an MSn analysis.
There is thus a need in the art of mass spectrometry of biomolecules for improved methods of analysis that can efficiently differentiate signal from noise, correctly allocate related m/z values into proper isotopic clusters, correctly determine charge states and properly organize the various charge states into distribution envelopes. Such improvements are required for success in both data acquisition and post acquisition processing workflows.
Preferably, the improved methods and algorithms should be able to work in a “real-time” environment such that automated data-dependent decisions may be made while mass spectra are being acquired. Such methods and algorithms should be able to not only extract as much information from each mass spectrum as possible, but also to direct subsequent MSn analysis in a desired way based on the information gathered in a preceding mass spectrum. The present disclosure addresses these needs.