Mass spectrometry (MS) is commonly used to provide information related to protein composition and peptide sequence. As efforts shift from sequencing the genome to understanding and identifying expressed genes and protein function, it is increasingly important that analytical tools be developed for providing reliable and rapid protein sequencing. Such protein sequence information can be used in proteomic databases and for identifying, understanding and using sequence information in a wide range of applications from fundamental research to medical treatment. The systems and methods disclosed herein provide improved tools for protein sequencing by increasing reliability and decreasing the experimental time required to analyze protein samples.
Proteins are involved in nearly every aspect of cellular function. In fact, the characterization of proteins has become such a significant part of modern biology, it has inspired a new discipline: Proteomics—the classification of the protein complement expressed by the genome of an organism. Technology development has, and continues, to drive rapid evolution in this field. Over the past several years many mass spectrometry (MS)-based protein identification strategies have emerged. Technical developments in chromatography and MS instrumentation have made two types or protein sequencing methods popular: (1) bottom-up and (2) top-down. For the bottom-up approach, a protein-containing sample is digested with a proteolytic enzyme resulting in a complex mixture of peptides. Next, the digested sample is chromatographically separated (in one or multiple dimensions) and introduced to the mass spectrometer by means of a nanoflow—high performance liquid chromatography column (nHPLC, ˜50 mL/min) integrated directly to an electrospray ionization (ESI) source on the mass spectrometer. The ESI source converts condensed phase ions, eluting from the HPLC column, to multiply-protonated molecules (cations) in the gas-phase—a requirement for MS analysis. The mass spectrometer first records the mass/charge (m/z) of each peptide ion and then selects the peptide ions individually to obtain sequence information via tandem mass spectrometry (MS/MS). In a typical shotgun proteomics experiment a cell lysate, containing as many as several thousand proteins, is analyzed. In the top-down method intact proteins are ionized and directly sampled by the mass spectrometer and then fragmented during MS/MS analysis
Tandem mass spectrometry is a method whereby peptides undergo the process of ion fragmentation with subsequent m/z measurement. Ion fragmentation for peptide and protein sequence analysis, with RF 3D quadrupole ion traps (QIT), quadrupole time-of-flight (Qq-TOF), and RF linear multipole ion trap (QLT) instruments, is generally performed via collision-activated dissociation (CAD). In this process, peptides that are protonated more or less randomly on backbone amide nitrogen atoms are kinetically excited and undergo collisions with an inert gas such as helium or argon. During each collision, imparted translational energy is converted to vibrational energy that is then rapidly distributed throughout all covalent bonds (ca. psec timescale). Fragment ions are formed when the internal energy of the ion exceeds the activation barrier required for a particular bond cleavage. Fragmentation of protonated amide bonds affords a homologous series of complementary product ions of type b and y. Subtraction of the m/z values for the fragments within a given ion series that differ by a single amino acid, affords the mass, and thus the identity of the extra residue in the larger of the two fragments. The complete amino acid sequence of a peptide can be directly deduced (de novo interpretation) by extending this process to all homologous pairs of fragments within a particular ion series.
Electron transfer dissociation (ETD) is a more recent technology for peptide fragmentation. Rather than using collisions, ETD reacts the selected peptide cations with anions of fluoranthene (or other negatively charged small molecules). This reaction proceeds by transfer of an electron from the fluoranthene anion to the peptide (an ion/ion reaction). The added electron causes the peptide to break randomly between each amino acid. Once the peptide is fragmented the masses of each fragment are then recorded. Unlike CAD, ETD causes cleavage of a different backbone bond to produce c and z-type fragment ions, rather than the b and y-type fragments generated by CAD. ETD can be considered a derivative of electron capture dissociation (ECD) which uses free electrons rather than anions to induce the same fragmentation pathways.
To date, ETD has been implemented on low resolution and mass accuracy mass spectrometers; however, we have recently modified a hybrid linear ion trap-orbitrap mass spectrometer to perform ETD (McAlister et al. Anal. Chem. 79(10) 3525-3534, 2007). This system routinely achieves 60,000 resolving power and measures the mass of ETD fragments to the third or fourth decimal place (low ppm to ppb mass accuracies). The current state-of-the-art bioinformatics approach to assigning peptide sequences to raw tandem mass spectra relies on spectral correlation methods. Yates and co-workers described a protein database/tandem mass spectral correlation algorithm (SEQUEST) in 1994. Since then numerous related programs have been reported. Each of these algorithms follows similar logic, with candidate peptide scoring and correlation being the major difference.
The common methodology involves: (1) pre-process the tandem mass spectrum, (2) compare precursor peptide mass to those obtained from an in silico protein digest, (3) score each sequence candidate's fit to the experimental spectrum, and (4) generate a scored output of sequences. Permutations of this strategy are mostly found in scoring step—Yates and colleagues have described the use of cross-correlation algorithms, while several newer methods utilize probability-based matching to calculate a matches statistical significance. Geer et al. have described one such method open mass spectrometry search algorithm, (OMSSA). In 1994 Mann and Wilm also introduced a notable variant of this approach called peptide sequence tag database searching. Here an algorithm attempts to extract partial sequence information directly from the tandem mass spectrum. The idea being most CAD tandem mass spectra do not contain complete backbone fragmentation, but small runs of 2-5 consecutive fragments (e.g., b2, b3, b4,) are more likely to exist. This tag is then rastered along the predicted protein sequences of a predicted database. Once a matching sequence tag is identified from the database, neighboring residues are surveyed to determine a match. Advantages of this approach are increased probability to identify a priori unknown PTMs or potentially single amino acid substitutions. Generally, the sequence tag approach is not a widespread tool routinely used in high-throughput proteomics data analysis.
The current state-of-the-art techniques suffers from being relatively slow and computationally intensive, which in turn hinders real-time proteomic generation, identification and sequencing. They also require the candidate peptide sequence be predicted and present in the database. Techniques are needed that increase experimental speed and reduce computations to allow for real-time identification of unknown peptides during mass spectral analysis, while maintaining reliability. Real-time analysis provides additional capabilities such as intelligent data acquisition wherein an instrument makes automated decisions related to subsequent analysis. Reliable sequence assignment by mass spectrometry provides the capability of sequence determination without having to rely on additional database searching, including real-time sequence determination.
Information generated from the techniques, systems and methods disclosed herein have a range of uses, including but not limited to, commercial protein databases, algorithms related to searching of protein databases, which are of interest to mass spectrometer manufacturers and service providers, and for researchers using mass spectrometry for peptide sequencing and protein identification.