Proteomics generally refers to studies involving complex mixtures of proteins derived from biological systems. Proteomic studies often focus on identification of proteins or determination of changes in the state of a biological system. Identification and quantification of proteins in complex biological samples is a fundamental problem in proteomics.
Liquid chromatography coupled with mass spectrometry (LC/MS) has become a fundamental tool in proteomic studies. Separation of intact proteins or of their proteolyzed peptide products by liquid chromatography (LC) and subsequent analysis by mass spectrometry (MS) forms the basis of many common proteomic methodologies. Methods that measure changes in the expression level of proteins are of great interest as they can form the basis of biomarker discovery and clinical diagnostics.
Rather than directly analyzing intact proteins, proteins of are typically digested to produce a specific set of proteolytic peptides. The resulting peptides are then often characterized via LC/MS analysis. A common enzyme used for digestion is trypsin. In tryptic digestion, the proteins present in a complex mixture are cleaved to produce peptides as determined by the cleavage specificity of the proteolytic enzyme. From the identity and concentration of the observed peptides, available algorithms serve to identify and quantify the proteins in the sample.
In LC/MS analysis, the peptide digest is first separated and analyzed by LC separation followed by MS analysis. Ideally, the mass of a single peptide, measured with sufficient accuracy, provides a unique identification of the peptide. In practice, however, achieved mass accuracies typically are on the order of 10 ppm or larger. In general, such mass accuracy is not sufficient to uniquely identify a peptide using the mass measurement alone.
For example, in the case of a mass accuracy of 10 ppm, on the order of 10 peptide sequences are identified in a search of a typical database of peptides sequences. This number of sequences would increase significantly if search restraints on mass accuracy were lowered, or searches for chemical or post-translational modifications, losses of H2O or NH3, and point mutations were allowed, for example. Thus, if a peptide's sequence is modified by either a deletion or substitution, use of only the precursor's mass for identification of the peptide will lead to a false identification. A further complication arises from the possibility that two peptides can have the same amino acid composition but have different sequences.
In the case of peptide precursors, product fragments can be obtained by fragmentation at a single peptide bond in the precursor. Such a single fragmentation produces two sub-sequences. The fragment containing the peptide's C-terminal, if ionized, is termed a Y-ion, and the fragment containing the peptide's N-terminal, if ionized is termed a B-ion.
Proteins are often identified by comparing analysis data to a database that associates protein identities with information about fragments of the proteins, such as masses of the fragments. For example, if a theoretical peptide mass from a database lies within a mass search window of the mass of a precursor measured in the data, it is deemed a hit.
The search can provide a list of possible matching peptides found in the database. These possible matching database peptides may or may not be weighted by statistical factors. The possible outcomes of such a search are that no possible matching database peptides are identified, one possible matching database peptide is identified, or more than one possible matching database peptide are identified. The higher the resolution of the MS, assuming proper instrument calibration, the smaller the ppm threshold, and consequently, the fewer the false identifications. If there are one or more matches to the peptides in the database, peptide-fragment ion data may be used to validate a match.
During a search, multiple charge states and multiple isotopes can be searched. Further, empirically produced confidence rules can be applied to help identify valid matches.