The study of proteins in living cells and in tissues (proteomics) is an active area of clinical and basic scientific research because metabolic control in cells and tissues is exercised at the protein level. For example, comparison of the levels of protein expression between healthy and diseased tissues, or between pathogenic and nonpathogenic microbial strains, can speed the discovery and development of new drug compounds or agricultural products. Further, analysis of the protein expression pattern in diseased tissues or in tissues excised from organisms undergoing treatment can also serve as diagnostics of disease states or the efficacy of treatment strategies, as well as provide prognostic information regarding suitable treatment modalities and therapeutic options for individual patients. Still further, identification of sets of proteins in samples derived from microorganisms (e.g., bacteria) can provide a means to identify the species and/or strain of microorganism as well as, with regard to bacteria, identify possible drug resistance properties of such species or strains.
One important aspect of proteomics is the identification of proteins with altered expression levels. Differences in protein and metabolite levels over time or among populations can be associated with diseased states, drug treatments, or changes in metabolism. Identified molecular species may serve as biological markers for the disease or condition in question, allowing for new methods of diagnosis and treatment to be developed. Conventionally, because of the large number of proteins that are generally present in any sample extracted from natural tissue or cells, the proteins must first be separated into individual components by gel or capillary electrophoresis or affinity techniques, before the individual proteins levels can be assessed and compared to a database or between samples.
Because it can provide detailed structural information, mass spectrometry (MS) is currently considered to be a valuable analytical tool for biochemical mixture analysis and protein identification. Conventional methods of protein analysis therefore often combine two-dimensional (2D) gel electrophoresis, for separation and quantification, with mass spectrometric identification of proteins. Also, capillary liquid chromatography as well as various other “front-end” separation techniques have been combined with electrospray ionization tandem mass spectrometry for large-scale protein identification without gel electrophoresis. Using mass spectrometry, qualitative differences between mass spectra can be identified, and proteins corresponding to peaks occurring in only some of the spectra serve as candidate biological markers.
In recent years, mass spectrometry has also gained popularity as a tool for identifying microorganisms due to its increased accuracy and shortened time-to-result when compared to traditional methods for identifying microorganisms. To date, the most common mass spectrometry method used for microbial identification is matrix-assisted laser desorption ionization time-of-flight (MALDI-TOF) mass spectrometry. In MALDI-TOF, cells of an unknown microorganism are mixed with a suitable ultraviolet light absorbing matrix solution and are allowed to dry on a sample plate. Alternatively, an extract of microbial cells is used instead of the intact cells. After transfer to the ion source of a mass spectrometer, a laser beam is directed to the sample for desorption and ionization of the proteins and time-dependent mass spectral data is collected.
The mass spectrum of a microorganism produced by MALDI-TOF methods reveals a number of peaks from intact peptides, proteins, protein fragments, and other molecules that constitute the microorganism's “fingerprint”. This method relies on the pattern matching of the peak profiles in the mass spectrum of an unknown microorganism to a reference database comprising a collection of mass spectra for known microorganisms obtained using essentially the same experimental conditions. The better the match between the spectrum of the isolated microorganism and a spectrum in the reference database, the higher the confidence level in identification of the organism at the genus, species, or in some cases, subspecies level. Because the method relies upon matching the patterns of peaks in MALDI-TOF mass spectra, there is no requirement to identify or otherwise characterize the proteins represented in the spectrum of the unknown microorganism in order to identify it.
Although MALDI-TOF methods are rapid and cost effective, they have limitations that restrict the range of applications to pathogen characterization and identification including but not limited to virulence detection and quantitation, resistance marker determination, strain matching, and antibiotic susceptibility testing to name a few. The information content within a MALDI mass spectrum reflects the most abundant and ionizable proteins which are generally limited to ribosomal proteins at the experimental conditions used. Because ribosomal proteins are highly conserved among prokaryotes, differentiation of closely related microorganisms by MALDI-TOF is limited. In this case many of the ribosomal proteins across closely related species contain either the same or slightly different amino acid sequences (i.e. single amino acid substitutions) that cannot be effectively differentiated with low resolution mass spectrometers. Moreover, determination of strain and/or serovar type, antibiotic resistance, antibiotic susceptibility, virulence or other important characteristics relies upon the detection of protein markers other than ribosomal proteins which further limits the application of MALDI-TOF for microbial analysis. Laboratories using MALDI-TOF for identification of microorganisms must use other methods to further characterize the identified microbes. In addition, the MALDI-TOF method's reliance upon matching spectral patterns requires a pure culture for high quality results and thus is not generally suitable for direct testing, mixed cultures, blood culture, or other complex samples containing different microorganisms.
Several other mass spectrometry methods for detection of microorganisms have been used. For example, mass spectrometry-based protein sequencing methods have been described wherein liquid chromatography is coupled to tandem mass spectrometry (LC-MS/MS) and sequence information is obtained from enzymatic digests of proteins derived from the microbial sample. This approach, termed “bottom-up” proteomics, is a widely practiced method for protein identification. The method can provide identification to the subspecies or strain level as chromatographic separation allows the detection of additional proteins other than just ribosomal proteins, including those useful for characterization of antibiotic resistance markers and virulence factors.
In contrast to “bottom-up” proteomics, “top-down” proteomics refers to methods of analysis in which protein samples are introduced intact into a mass spectrometer, without enzymatic, chemical or other means of digestion. Top-down analysis enables the study of the intact protein, allowing identification, primary structure determination and localization of post-translational modifications (PTMs) directly at the protein level. Top-down proteomic analysis typically consists of introducing an intact protein into the ionization source of a mass spectrometer, fragmenting the protein ions and measuring the mass-to-charge ratios and abundances of the various fragments so-generated. The resulting fragmentation is many times more complex than a peptide fragmentation, which may, in the absence of the methods taught herein, necessitate the use of a mass spectrometer with very high mass accuracy and resolution capability in order to interpret the fragmentation pattern with acceptable certainty. The interpretation generally includes comparing the observed fragmentation pattern to either a protein sequence database that includes compiled experimental fragmentation results generated from known samples or, alternatively, to theoretically predicted fragmentation patterns. For example, Liu et al. (“Top-Down Protein Identification/Characterization of a Priori Unknown Proteins via Ion Trap Collision-Induced Dissociation and Ion/Ion Reactions in a Quadrupole/Time-of-Flight Tandem Mass Spectrometer”, Anal. Chem. 2009, 81, 1433-1441) have described top-down protein identification and characterization of both modified and unmodified unknown proteins with masses up to ≈28 kDa.
An advantage of a top-down analysis over a bottom-up analysis is that a protein may be identified directly, rather than inferred as is the case with peptides in a bottom-up analysis. Another advantage is that alternative forms of a protein, e.g. post-translational modifications and splice variants, may be identified. However, top-down analysis has a disadvantage when compared to a bottom-up analysis in that many proteins can be difficult to isolate and purify. Thus, each protein in an incompletely separated mixture can yield, upon mass spectrometric analysis, multiple ion species, each species corresponding to a different respective degree of protonation and a different respective charge state, and each such ion species can give rise to multiple isotopic variants. A single MS spectrum measured in a top-down analysis can easily contain hundreds to even thousands of peaks which belong to different analytes—all interwoven over a given m/z range in which the ion signals of very different intensities overlap and suppress one other.
Because mass spectra of biological samples, as obtained in top-down analyses, are generally very complex, improved methods are required for interpreting the mass spectra. The resulting computational challenge that such methods must overcome is to trace each peak back to a certain analyte(s) and, once this is done for one or several analytes, to determine the molecular weights of analyte(s) in a process which is best described as mathematical decomposition (also referred to, in the art, as mathematical deconvolution). A still further challenge associated with the use of mass spectral analyses of proteins and polypeptides in a clinical setting is to derive such information in the shortest time period possible, often termed as analysis in “real time”. Obviously, the computations are much more challenging in real time during an automatic top-down data dependent analysis since this should occur very fast, especially when chromatographic separation is involved. To succeed, one needs to provide both: (i) an optimized real time computational strategy as well as (ii) a mass spectral data acquisition strategy that anticipates multiple mass spectral lines for each ion species and that anticipates efficient isolation of analyte compounds of interest from a potential multitude of contaminant compounds.
The existing data dependent and dynamic exclusion workflow techniques and corresponding algorithms were developed for small molecules, small peptides and other analytes which acquire a limited number of charges (for example, 1-3 charges) in the electrospray ionization process. When applied to higher-molecular-weight biopolymer analytes (most commonly, intact proteins during the course of top-down proteomics studies) these conventional methodologies significantly under-perform due to a combination of different electrospray behavior and computational limitations. More specifically: (1) intact high mass analytes in general, and proteins in particular, develop many more charge states (up to 50 charges or more per molecule, e.g., FIG. 12C) than do small molecules during the electrospray ionization process because of a greater number of charge acquiring sites which results in much more complex MS spectra; (2) in complex mixtures such as cell lysates or their fractions, there is a wide distribution of molecular weights and copy numbers which results in a very complex overlap of charge state distribution patterns of varying intensities; (3) the variability in physiochemical properties of the high-mass analytes of the same or different chemical nature produces significant variability of chromatographic peak shapes and analyte retention on the column; (4) if the mass spectra are acquired on a mass spectrometer with high resolving power such as an Orbitrap™ mass analyzer (a type of electrostatic trap mass analyzer) or a time-of-flight (TOF) mass analyzer, corresponding peaks further resolve into a number of isotopes in a series of clusters whose quality is often far from a theoretical binomial distribution; (5) matrix ionization effects of a variety of different proteins can greatly influence the observed intensity of multiply overlapping species so as to distort the true ratios of protein intensities found in any given standard or sample. Additional levels of complexity are introduced by oxidized species of the same analyte or adducts, overlaps of isotope clusters and inability of existing software tools correctly calculate charge state for high mass species.
In practical terms, the above considerations imply that, in the case of intact proteins and other biopolymers, existing data dependent algorithms are being confounded and MS/MS is being performed in a redundant fashion on a number of different charge states from the same biopolymer. Also, when isotopic clusters do not match the traditional binomial distribution patterns defined by the number of carbon, hydrogen, nitrogen, oxygen, and sulfur atoms present in a given biopolymer, or do not meet intensity threshold or signal-to-noise requirements, redundancy occurs from fragmenting multiple isotopes which belong to the same isotopic cluster. This duplication of work leads to redundancy in identification of the most abundant/ionizable proteins, while the information about other species is lost and provides very little opportunity for triggering an MSn analysis.
With regard to efficient instrument-associated data acquisition strategies, it may be noted that ion-ion reactions have found great utility in the field of biological mass spectrometry over the last decade, primarily with the use of electron transfer dissociation (ETD) to dissociate peptide/proteins and determine primary sequence information and characterize post-translational modifications. Proton transfer, another type of ion-ion reaction, has also been used extensively in biological applications. Experimentally, in one form of proton transfer, multiply-positively-charged protein ions (i.e., protein cations) from a sample are allowed to react with singly-charged reagent anions so as to reduce the charge state of an individual protein cation and the number of such charge states of the protein cations. These reactions proceed with pseudo-first order reaction kinetics when the reagent anions are present in large excess over the protein cation population. The rate of reaction is directly proportional to the square of charge of the protein cation (or other multiply-charged cation) multiplied by the charge on the reagent anion. The same relationship also holds for reactions of the opposite polarity, defined here as reaction between singly-charged reagent cations and a population of multiply-charged anions derived from a protein sample. This produces a series of pseudo-first order consecutive reaction curves as defined by the starting multiply-charged protein cation population. Although the reactions are highly exothermic (in excess of 100 kcal/mol), proton transfer is an even-electron process performed in the presence of 1 mtorr of background gas (i.e. helium) and thus does not fragment the starting multiply-charged protein cation population. The collision gas serves to remove the excess energy on the microsecond time scale (108 collisions per second), thus preventing fragmentation of the resulting product ion population.
Proton transfer reactions (PTR) have been used successfully to identify proteins in mixtures of proteins. Particularly, application of proton transfer reaction methods may be envisioned as a mixture simplification process that is carried out in real-time (a few milliseconds) in a mass spectrometer that separates mass spectral signatures of proteins and polypeptides from one another as well as from generally low-charge contaminant ions. This procedure enables isolation of the analyte proteins and polypeptide ions either as a group or as individual ion species and has thus been employed to determine charge state and molecular weights of high mass proteins. PTR has also been utilized for simplifying product ion spectra derived from the collisional-activation of multiply-charged precursor protein ions. Although PTR reduces the overall signal derived from multiply-charged protein ions, this is more than offset by the significant gain in signal-to-noise ratio of the resulting PTR product ions. The PTR process is 100% efficient leading to only single series of reaction products, and no side reaction products that require special interpretation and data analysis.
Various aspects of the application of PTR to the analysis of peptides, polypeptides and proteins have been described in the following documents: U.S. Pat. No. 7,749,769 B2 in the names of inventors Hunt et al., U.S. Patent Pre-Grant Publication No. 2012/0156707 A1 in the names of inventors Hartmer et al., U.S. Pre-Grant Publication No. 2012/0205531 A1 in the name of inventor Zabrouskov; McLuckey et al., “Ion/Ion Proton-Transfer Kinetics: Implications for Analysis of Ions Derived from Electrospray of Protein Mixtures”, Anal. Chem. 1998, 70, 1198-1202; Stephenson et al., “Ion-ion Proton Transfer Reactions of Bio-ions Involving Noncovalent Interactions: Holomyoglobin”, J. Am. Soc. Mass Spectrom. 1998, 8, 637-644; Stephenson et al., “Ion/Ion Reactions in the Gas Phase: Proton Transfer Reactions Involving Multiply-Charged Proteins”, J. Am. Chem. Soc. 1996, 118, 7390-7397; McLuckey et al., “Ion/Molecule Reactions for Improved Effective Mass Resolution in Electrospray Mass Spectrometry”, Anal. Chem. 1995, 67, 2493-2497; Stephenson et al., “Ion/Ion Proton Transfer Reactions for Protein Mixture Analysis”, Anal. Chem. 1996, 68, 4026-4032; Stephenson et al., “Ion/Ion Reactions for Oligopeptide Mixture Analysis: Application to Mixtures Comprised of 0.5-100 kDa Components”, J. Am. Soc. Mass Spectrom. 1998, 9, 585-596; Stephenson et al., “Charge Manipulation for Improved Mass Determination of High-mass Species and Mixture Components by Electrospray Mass Spectrometry”, J. Mass Spectrom. 1998, 33, 664-672; Stephenson et al., “Simplification of Product Ion Spectra Derived from Multiply Charged Parent Ions via Ion/Ion Chemistry”, Anal. Chem., 1998, 70, 3533-3544 and Scalf et al., “Charge Reduction Electrospray Mass Spectrometry”, Anal. Chem. 2000, 72, 52-60. Various aspects of general ion/ion chemistry have been described in McLuckey et al., “Ion/Ion Chemistry of High-Mass Multiply Charged Ions”, Mass Spectrom. Rev. 1998, 17, 369-407 and U.S. Pat. No. 7,550,718 B2 in the names of inventors McLuckey et al. Apparatus for performing PTR and for reducing ion charge states in mass spectrometers have been described in U.S. Pre-Grant Publication No. 2011/0114835 A1 in the names of inventors Chen et al., U.S. Pre-Grant Publication No. 2011/0189788 A1 in the names of inventors Brown et al., U.S. Pat. No. 8,283,626 B2 in the names of inventors Brown et al. and U.S. Pat. No. 7,518,108 B2 in the names of inventors Frey et al. Adaptation of PTR charge reduction techniques to detection and identification of organisms has been described by McLuckey et al. (“Electrospray/Ion Trap Mass Spectrometry for the Detection and Identification of Organisms”, Proc. First Joint Services Workshop on Biological Mass Spectrometry, Baltimore, Md., 28-30 Jul. 1997, 127-132).
The product ions produced by the PTR process can be accumulated into one or into several charge states by the use of a technique known as “ion parking”. Ion parking uses supplementary AC voltages to consolidate the PTR product ions formed from the original variously protonated ions of any given protein molecule into a particular charge state or states at particular mass-to-charge (m/z) values during the reaction period. This technique can be used to concentrate the product ion signal into a single or limited number of charge states (and, consequently, into a single or a few respective m/z values) for higher sensitivity detection or further manipulation using collisional-activation, ETD, or other ion manipulation techniques. Various aspects of ion parking have been described in U.S. Pat. No. 7,064,317 B2 in the name of inventor McLuckey; U.S. Pat. No. 7,355,169 B2 in the name of inventor McLuckey; U.S. Pat. No. 8,334,503 B2 in the name of inventor McLuckey; U.S. Pat. No. 8,440,962 B2 in the name of inventor Le Blanc; and in the following documents: McLuckey et al., “Ion Parking during Ion/Ion Reactions in Electrodynamic Ion Traps”, Anal. Chem. 2002, 74, 336-346; Reid et al., “Gas-Phase Concentration, Purification, and Identification of Whole Proteins from Complex Mixtures”, J. Am. Chem. Soc. 2002, 124, 7353-7362; He et al., “Dissociation of Multiple Protein Ion Charge States Following a Single Gas-Phase Purification and Concentration Procedure”, Anal. Chem. 2002, 74, 4653-4661; Xia et al., “Mutual Storage Mode Ion/Ion Reactions in a Hybrid Linear Ion Trap”, J. Am. Soc. Mass. Spectrom. 2005, 16, 71-81; Chrisman et al., “Parallel Ion Parking: Improving Conversion of Parents to First-Generation Products in Electron Transfer Dissociation”, Anal. Chem. 2005, 77(10), 3411-3414 and Chrisman et al., “Parallel Ion Parking of Protein Mixtures”, Anal. Chem. 2006, 78, 310-316.
As a result of the ongoing requirement in the art of mass spectral proteome analysis for analysis of complex natural samples in real-time or near-real-time, there is thus a need for improved methods of mass analysis, both instrumental and computational, that can efficiently separate analytes from contaminants, differentiate signal from noise, correctly allocate related m/z values into proper isotopic clusters, correctly determine charge states and properly organize the various charge states into distribution envelopes. Such improvements are required for success in both data acquisition and, optionally, post-acquisition processing workflows. Preferably, the improved instrumental methods, workflows and algorithms should be able to work in a “real-time” environment such that automated data-dependent decisions may be made while mass spectra are being acquired and such that clinical interpretations may be made shortly thereafter. The present disclosure addresses these needs.