The development of Next Generation DNA sequencing methods for quickly acquiring genome and gene expression information has transformed biology. The basis of Next Generation DNA sequencing is the acquisition of large numbers (millions) of short reads (typically 35-450 nucleotides) in parallel. While nucleic acid mutations frequently underlie disease, these changes are most readily embodied by proteins expressed in specific bodily compartments (i.e. saliva, blood, urine) that are accessible without invasive procedures such as biopsies. Unfortunately, a similar high-throughput method for the large-scale identification and quantitation of specific proteins in complex mixtures remains unavailable; representing a critical bottleneck in many biochemical, molecular diagnostic and biomarker discovery assays.
The first method for analysis of the N-terminal amino acid of polypeptides was described by Frederick Sanger, who demonstrated that the free unprotonated α-amino group of peptides reacts with 2,4-dinitrofluorobenzene (DNFB) to form yellow 2,4-dinitrophenyl derivatives (FIG. 1). When such a derivative of a peptide, regardless of its length, is subjected to hydrolysis with 6 N HCl, all the peptide bonds are hydrolyzed, but the bond between the 2,4-dinitrophenyl group and the α-amino of the N-terminal amino acid is relatively stable to acid hydrolysis. Consequently, the hydrolyzate of such a dinitrophenyl peptide contains all the amino acid residues of the peptide chain as free amino acids except the N-terminal one, which appears as the yellow 2,4-dinitrophenyl derivative. This labeled residue can easily be separated from the unsubstituted amino acids and identified by chromatographic comparison with known dinitrophenyl derivatives of the different amino acids.
Sanger's method has been largely supplanted by more sensitive and efficient procedures. An example of one such method employs the labeling reagent 1-dimethylaminoaphthalene-5-sulfonyl chloride (dansyl chloride) (FIG. 2). Since the dansyl group is highly fluorescent, dansyl derivatives of the N-terminal amino acid can be detected and measured in minute amounts by fluorimetric methods. The dansyl procedure is 100 times more sensitive that the Sanger method.
The most widely used reaction for the sequential analysis of N-terminal residue of peptides is the Edman degradation method (Edman et al. “Method for determination of the amino acid sequence in peptides”, Acta Chem. Scand. 4: 283-293 (1950) [1], (herein incorporated by reference). Edman degradation is a method of sequencing amino acids in a peptide wherein the amino-terminal residue is labeled and cleaved from the peptide without disrupting the peptide bonds between other amino acid residues (FIG. 3). In the Edman procedure phenylisothiocyanate reacts quantitatively with the free amino group of a peptide to yield the corresponding phenylthiocarbamoyl peptide. On treatment with anhydrous acid the N-terminal residue is split off as a phenylthiocarbamoyl amino acid, leaving the rest of the peptide chain intact. The phenylthiocarbornyl amino acid is then cyclized to the corresponding phenylthiohydantin derivative, which can be separated and identified, usually by gas-liquid chromatography. Alternatively, the N-terminal residue removed as the phenylthiocarbamoyl derivative can be identified simply by determining the amino acid composition of the peptide before and after removal of the N-terminal residue; called the subtractive Edman method. The advantage of the Edman method is that the rest of the peptide chain after removal of the N-terminal amino acid is left intact for further cycles of this procedure; thus the Edman method can be used in a sequential fashion to identify several or even many consecutive amino acid residues starting from the N-terminal end. Edman and Begg have further exploited this advantage by utilizing an automated amino acid “sequenator” for carrying out sequential degradation of peptides by the phenylisothiocyanate procedure (Eur. J. Biochem. 1:80-91, (1967) [2], (herein incorporated by reference). In one embodiment, such automated amino acid sequencers permit up to 30 amino acids to be accurately sequenced with over 99% efficiency per amino acid (Niall et al. “Automated Edman degradation: the protein sequenator”. Meth. Enzymol. 27: 942-1010, (1973) [3], (herein incorporated by reference).
A drawback to Edman degradation is that the peptides being sequenced cannot have more than 50 to 60 (more practically fewer than 30) amino acid residues. The sequenced peptide length is typically limited due to the increase in heterogeneity of the product peptides with each Edman cycle due to cyclical derivitization or cleavage failing to proceed to completion on all peptide copies. Furthermore, since Edman degradation proceeds from the N-terminus of the protein, it will not work if the N-terminal amino acid has been chemically modified or if it is concealed within the body of the protein. In some native proteins the N-terminal residue is buried deep within the tightly folded molecule and is inaccessible. Edman degradation typically is performed only on denatured peptides or proteins. Intact, folded proteins are seldom (if at all) subjected to Edman sequencing.
Importantly, the current automated peptide sequencers that perform Edman degradation cannot sequence and identify individual peptides within the context of a mixture of peptides or proteins. What is thus needed is a massively parallel and rapid method for identifying and quantitating individual peptide and/or protein molecules within a given complex sample.