Mass spectrometry in conjunction with database searching has become a method of choice for fast and efficient identification of proteins in biological samples. In particular tandem mass spectrometry of peptides in a complex digest can provide information relating to the identity and quantity of the proteins present in the sample mixture. Tandem mass spectrometry achieves this by isolating specific mass-to-charge ratio values (precursor ions) of the peptides, subjecting them to fragmentation and providing product ions that are used to sequence and identify peptides. The information created by the product ions of the peptides can be used to search protein and nucleotide sequence databases to identify the amino acid sequence represented by the spectrum and thus identify the protein from which the peptide was derived.
The identification procedure is performed in high-throughput mode by comparing experimental data such as the mass spectra with characteristic data such as theoretical sequences for peptides of previously identified (“known”) proteins. Searchable databases are available, e.g., at the National Center for Biotechnology Information (NCBI). They include databases of nucleotide sequence information and amino acid sequence information of peptides. To identify peptides, database searching programs typically compare each MS/MS spectrum against the sequences contained in the database, and a probability score is assigned to rank the most likely peptide match. The algorithms typically utilize mass-to-charge (m/z) information for identification purposes of the various product ions. The matching of peptide sequences based on their MS/MS fragmentation spectra to data from peptides extracted from databases does not necessarily identify them unambiguously or with 100% confidence. Some spectra may match very closely while others match less closely. A close match may or may not indicate the identity of the unknown peptide. Ranking of matches can be used to identify unreliable matches. For example, a second-best match in one analysis may be a true match indicating identity, whereas the best match in another analysis may be a false match obtained by chance, at random.
The fragmentation of precursor ions can be provided by various methodologies and mechanisms. Ion activation techniques that involve excitation of protonated or multiply protonated peptides include collision-induced dissociation (CID), and infrared multiphoton dissociation (IRMPD), and data generated using such techniques have traditionally been used to identify sequences. The advent of new non-ergodic fragmentation methodologies such as ETD and Electron Capture Dissociation (ECD), have created new capabilities for mass spectrometry. Due to its non-ergodic character, ETD is thought to provide more complete information on primary structure of peptides. At the same time, spectra created via ETD fragmentation are more complicated. In addition to the fragment ions, the spectra contain products of proton abstraction, rearrangement and neutral losses mainly, due to but not limited to, amino bond related groups. In many cases, fragment ions in ETD are less abundant than the charge-reduced forms of the precursor ion. It has been found that the use of algorithms and software that has been written specifically to evaluate spectra produced via CID produces erroneous results if applied to spectra produced via ETD, causing the confidence that one has that a match is correct to be low. All of these problems call for a new algorithmic approach optimized for peptide identification of ETD spectra, other non-ergodic ion-ion reaction produced spectra, and multiple or sequential ion-ion reaction produced spectra.