Proteomic techniques that permit the identification, quantification, and localization of proteins in cells will advance the understanding of cell function and development far beyond what has been achieved by genomic techniques. For example, the ability of advanced mass spectrometry techniques to analyze complex protein mixtures, e.g., multi-protein complexes, cell fractions and whole cells extracts, promises to provide powerful high throughput diagnostic and screening methods.
Mass spectrometry can be used to identify single proteins or large number of proteins in mixtures. In addition, mass spectrometry can be used to sequence a peptide de novo. For example, tandem mass spectrometry of peptides generated by proteolytic digestion of a complex protein mixture (e.g., a cell extract) can be used to identify and quantify the proteins present in original mixture. This result can be achieved because tandem mass spectrometers capable of selecting single m/z values and subjecting the ions to collision induced disassociation (CID) can be used to sequence and identify peptides. The information created by CID of a peptide can be used to search protein and nucleotide sequence databases to identify the amino acid sequence represented by the spectrum and thus identify the protein from which the peptide was derived.
Tandem mass spectrometry used to identify a peptide in a complex mixture of peptides derived from digested proteins utilizes three types of information. First, the mass of the peptide is obtained. This information alone can greatly reduce number of possible peptide sequences, particularly if the protein was digested with a sequence specific protease. The second type of information is the pattern of fragment ions produced by CID of the peptide ion. Analytical methods that compare the fragment ion pattern to theoretical fragment ion patterns generated computationally from sequence databases can be used to identify the peptide sequence. Such methods can identify the best match peptides and statistically determine which peptide sequence is more likely to be correct. The accuracy of the predictions can be increased further by using multiple dimensions of MS analysis to obtain de novo the sequence of a portion of a peptide. This direct sequence information can be used to further increase the accuracy of the prediction based on the fragment ion patterns. Once the peptide is identified, the protein from which it was generated can be readily determined by searching sequence databases.
Proteins in complex mixtures, e.g., cell extracts, can be identified by a combination of enzymatic proteolysis, liquid chromatographic separation, tandem mass spectrometry, computer algorithms which correlate peptide mass spectra to those theoretically predicted based on sequence databases and by de novo sequencing.
Electrospray ionization permits liquid chromatography to be directly coupled to a tandem mass spectrometer so that complex mixtures can be temporally separated prior to introduction into the mass spectrometer. The increase in the number of organisms for which a complete genome sequence is available will greatly increase the value of this approach to the analysis of complex mixtures.