Mass spectroscopy (MS) has emerged as a powerful analytical tool for studying biopolymers, e.g., polypeptides, polynucleotides, and polysaccharides due to its high sensitivity, speed, and capability for analysis of highly complex mixtures. For example, a variety of techniques have been developed for identifying proteins in biological samples (e.g., cell extracts). Typically, the proteins in a sample of interest are first separated by two-dimensional gel electrophoresis (2D Gel). Selected gel spots are then excised and digested with one or more digestive enzymes (e.g., trypsin) to break the proteins into collections of shorter polypeptide chains. These digests are then analyzed via mass spectroscopy and the resulting spectra are compared to spectra predicted from amino acid sequence information contained in databases (e.g., SwissProt/TrEMBL, NCBI Protein Database, etc.). Identifications are made based on the improbability of more than one protein matching the observed spectra (e.g., see Strupat et al., Anal. Chem. 66:464, 1994). A basic limitation of polypeptide mass fingerprinting, as this approach is commonly called, stems from the fact that it can only be used to identify proteins for which sequences are already known; it is incapable of identifying previously unknown proteins.
In general, 2D Gel separations have proven to be slow and time consuming, thus higher throughput methods using multi-dimensional liquid chromatography (MDLC) have also been developed (e.g., see Yates et al., Anal. Chem. 69:767, 1997). Several variations of this process are in use, but they all typically begin with an enzymatic digestion of the proteins present in the sample, resulting in a complex mixture containing polypeptide chains from many different proteins. This complex mixture is then separated via MDLC, typically using Strong Cation Exchange (SCX) followed by Reverse Phase (RP). The resulting separations typically contain polypeptides from multiple proteins. These separations are analyzed via mass spectroscopy and the results compared to predicted spectra as before. In most cases, tandem mass spectroscopy (MS/MS) is used to perform the analysis (e.g., see Ducret et al., Protein Sci. 7:706, 1998). In this process, polypeptides eluting from the separation stage are analyzed in the first stage of a tandem mass spectrometer that selects certain polypeptide ions for fragmentation and analysis in the second stage of the tandem mass spectrometer. The resulting spectra give more detailed information about the structure of the selected polypeptide ions, improving the identification.
One of the problems with the use of MDLC and MS/MS for protein identification is that it is difficult to get broad coverage of the proteins present in a sample. This can usually be attributed to the process used to select ions in the first stage for fragmentation in the second stage. Good identification can be made if a sufficient number of polypeptides from a given protein are selected for fragmentation. However, since the process of selecting polypeptides and collecting spectra in the second stage is slow relative to flow from the separation stage, it is not always possible to select all polypeptides present in an elution peak for analysis in the second stage. Algorithms used for selection make real-time decisions based on a variety of factors including relative abundance of an ion in the first stage spectra, and the time since a given mass has been selected. They may also have provisions for providing preference to specific masses or to exclude given masses, but these lists are generally manually constructed. The consequence of these selection approaches is that polypeptides with relatively high levels of abundance (i.e., polypeptides from relatively abundant proteins or common polypeptides that result from the digestion of several different proteins) are preferentially selected. Conversely, polypeptides resulting from proteins with relatively low levels of abundance or less than ideal ionization characteristics are frequently missed.
Tandem mass spectroscopy has an additional problem in that it is difficult to accurately measure the relative amounts of different polypeptides present in a given sample due to the high levels of ion loss associated with the ion selection process. In a complex spectrum where almost all ions are of interest, it is not possible to continuously monitor all the ions and perform MS/MS on each important ion. Since the ion intensity for each ion changes with the chromatographic elution profile of the polypeptide from which the ion is derived, the time spent selecting ions and performing MS/MS greatly reduces the number of data points collected for each ion thereby reducing the accuracy of the estimated amount of each polypeptide present in the sample.
In general, the limitations that are described above with reference to polypeptides also apply when mass spectroscopy is used to identify other biopolymers including polynucleotides and polysaccharides.