Biochemists frequently depend on reliable and fast determinations of the sequences of biological polymers. For example, sequence information is crucial in the research and development of peptide screens, genetic probes, gene mapping, and drug modeling, as well as for quality control of biological polymers when manufactured for diagnostic and/or therapeutic applications.
Various methods are known for sequencing polymers composed of essential biological building-blocks, such as amino acids and nucleotides. For example, existing methods for peptide sequence determination include the N-terminal chemistry of the Edman degradation, N- and C-terminal enzymatic methods, and C-terminal chemical methods. Existing methods for sequencing oligonucleotides include the Maxam-Gilbert base-specific chemical cleavage method and the enzymatic ladder synthesis with dideoxy base specific termination method. Each method possesses inherent limitations that preclude it being used exclusively for complete primary structure identification. To date, Edman sequencing and adaptations thereof are the most widely used tools for sequencing certain protein and peptides residue by residue, while the enzymatic synthesis method is preferred for sequencing oligonucleotides.
In the case of protein and peptide sequencing, C-terminal sequencing via chemical methods has proven particularly difficult while being only marginally effective, at best. (See, e.g., Spiess, J. (1986) Methods of Protein Characterization: A Practical Handbook (Shively, J. E. ed., Humana Press, N.J.) pp. 363-377; Tsugita et al. (1994) J. Protein Chemistry 13:476-479). Consequently, the C-terminus remains a region often not analyzed because of lack of a dependable method.
In the case of both peptides and oligonucleotides, an alternate approach to chemical sequencing is enzymatic cleavage sequencing. In the case of oligonucleotides, over 150 different enzymes have been isolated and found suitable for preparing oligonucleotide fragments. In the case of peptides, serine carboxypeptidases have proven popular over the last two decades because they offer a simple approach by which amino acids can be sequentially cleaved residue by residue from the C-terminus of a protein or a peptide. Carboxypeptidase Y (CPY), in particular, is an attractive enzyme because it non-specifically cleaves all residues from the C-terminus, including proline. (See, e.g., Breddam et al. (1987) Carlsburg Res. Commun. 52:55-63.)
Sequencing of peptides by carboxypeptidase digestion has traditionally been performed by a laborious, direct analysis of the released amino acids, residue by residue. Not only is this approach labor-intensive, but it is complicated by amino acid contaminants in the enzyme and protein/peptide solutions, as well as by enzyme autolysis. A further hindrance to any sequencing effort of this type is the absolute requirement for good kinetic information concerning the hydrolysis and liberation of each individual residue by the particular enzyme used.
With the advancement of mass spectrometric techniques capable of high mass analysis such as field desorption (Hong et al. (1983) Biomed. Mass Spectrom. 10:450-457), electrospray (Smith et al. (1993) 4 Techniques Protein Chem. 463-470), and thermospray (Stachowiak et al. (1988) J. Am. Chem. Soc. 110:1758-1765), it is possible to perform direct mass analysis on large biopolymers such as the peptide fragments resulting from CPY digestion in which the sequence order is preserved, circumventing the need for residue by residue amino acid analysis of the liberated amino acids. In this "ladder" sequencing approach, a sequence can be deduced, in the correct order, by simply calculating the mass differences between adjacent peptide peaks, the measured differences representing the loss of a particular amino acid residue.
More recently, matrix-assisted laser desorption ionization time-of-flight (MALDI-TOF) mass spectrometry also was shown to be suitable for ladder sequence analysis due to its high sensitivity, resolution, and mass accuracy. Chait et al. ((1993) 262 Science 89-92) exploited these assets of MALDI-TOF in the ladder sequencing of N-terminal ladders formed from partial blockage at each step of chemical digestion by the Edman degradation method. This approach, however, still suffers from the same limitations of traditional Edman chemistry including the complexity of the process which is time consuming and labor intensive and the lack of C-terminal information, however, it confirms the utility of MALDI-TOF for sequencing peptides using the peptide ladder scenario. Other researchers have also illustrated that carboxypeptidase digestion of peptides can be combined with MALDI-TOF to analyze the resulting mixture of truncated peptide. For example, eight consecutive amino acids have been sequenced from the C-terminus of human parathyroid hormone 1-34 fragment (Schar et al. (1991) Chimia 45:123-126). Additionally, carboxypeptidase digestion of peptides has been combined with other mass spectrometry methods such as plasma desorption (Wang et al. (1992) Techniques Protein Chemistry III (ed., R. H. Angeletti; Academic Press, N.Y.) pp. 503-515).
All of the above-described sequencing approaches, however, require preliminary optimization steps which are both tedious and time-consuming. Additionally, such preliminary optimization steps unnecessarily consume reagents as well as samples of polymer, usually available in limited quantities. Furthermore, frequently the above-described sequencing approaches ultimately rely on a single mass spectrum and a single mass-to-charge ratio data point, which can result in a statistically insufficient basis for determining a final polymer sequence.
It is an object of the present invention to provide methods and apparatus for sequencing polymers, particularly biopolymers, using mass spectrometry and time-independent/concentration-dependent hydrolysis of the polymer. It is an object of the present invention to also provide a rapid method for obtaining sequence information by circumventing the time-consuming optimization and method enhancement required by prior art methods. It is a further object of the present invention to provide sequence information using reduced quantities of total polymer by combining the sensitivity of mass spectrometry with elimination of sample loss by closely integrating hydrolysis with mass spectrometry analysis. It is another object of the present invention to provide a method for obtaining sequence information that incorporates a data interpretation strategy based on integrating mass-to-charge ratio data obtained from a plurality of parallel mass spectra.