Determining the amino acid sequence, i.e., primary structure, of a peptide is central to understanding the structure of the peptide, as well as to manipulating the peptide to achieve desired properties in a modified or altered form. In addition, the amino acid sequence of a peptide is useful in a variety of recombinant DNA procedures for identifying the gene coding sequence of the peptide, for producing the peptide recombinantly, and/or for producing site-specific modifications of the peptide.
Early attempts to determine the amino acid sequence of peptides relied on acid hydrolysis of the peptide or enzymatic degradation to separate the peptide into its component amino acids. Both of these methods were slow and produced complicated mixtures of amino acids which then had to be separated for analysis.
The development of reagents to sequence peptides by more systematic means greatly facilitated the determination of amino acid sequences. The most widely used method involves reacting the N-terminus of the peptide with phenyl isothiocyanate (PITC), a process known as Edman degradation (Edman). Reaction of PITC with the free terminal amino group adds a phenylthiourea group, which cyclizes to form a free anilinothiothiazolinone (ATZ) of the N-terminal amino acid, and a shortened peptide. The ATZ-derivative of the N-terminal amino acid is extracted, converted to the corresponding phenylthiothiohydantoin (PTH) which is then analyzed by HPLC. The amino-acid-PTH derivatives produced in the Edman reaction are racemized in the course of the Edman reaction, and thus the reaction cannot be used to distinguish L- and D-form amino acids.
N-terminal sequencing is carried out by successively converting the next-in N-terminal amino acid to the free amino acid PTH, and identifying each successively released amino acid. The method is generally reliable for N-terminal sequences up to about 20-40 or more amino acid residues.
Despite the relative ease and reliability of N-terminal sequencing methods, it is often desired to obtain C-terminal amino acid sequence information which may be inaccessible or only obtained with difficulty by this method. Information about the carboxy terminal sequence may be useful for certain types of recombinant DNA procedures, particularly since the C-terminal end of the coding region of a protein corresponds to the end closest to a poly A tail, which is likely to be present in cDNA clones.
Three general approaches have been proposed for C-terminals peptide sequencing: enzymetic, physical, and chemical. The enzymatic strategy, which involves analyzing the products resulting from treatment of the peptide with carboxypeptidase over time, is limited by the difficulty of controlling the extent of carboxypeptidase cleavage. Typically, the identification of the next-in amino acid becomes difficult after 3-5-residues have been cleaved.
The most common physical tools used for C-terminal sequencing are fast atom bombardment mass spectrometry (FAB/MS), and nuclear magnetic resonance (NMR) spectroscopy. FAB/MS analysis is applicable to 1-10 nmole amounts of peptide, but requires expensive mass spectrometry equipment. NMR analysis requires relatively large amounts of peptides, typically in the .mu.molar range, and also involves relatively expensive equipment.
In view of the limitations of enzymatic and physical approaches to C-terminal sequencing, considerable effort has been invested in developing chemical methods for determining C-terminal amino acids residues, and for C-terminal sequencing. An inherent difficulty in C-terminal sequencing is the relatively poor reactivity of the carboxyl group, in contrast to the relative ease of addition at the N-terminal amino group. Of the reaction methods which have been proposed for C-terminal sequencing, three have received special attention.
The first activation method involves generating a carboxyamido derivative at the C-terminal end of the peptide, followed by reaction with bis(I,I-trifluoroacetoxy)iodobenzen to form a derivative which rearranges and hydrolyses to a shortened carboxyamidopeptide and the aldehyde derivative of the C-terminal amino acid (Parham). The method has been successfully carried out only to 3-6 cycles before the reaction halts. In a second, related approach, the carboxy terminus is reacted with pivaloylhydroxamide to effect a Lossen rearrangement. One limitation of the method is that the chemistry does not degrade aspartic and glutamic acid residues (Miller, 1977).
The most widely studied of the C-terminal chemistries is the thiohydantoin (TH) reaction. In one general method for carrying out the TH method, the carboxyl group is activated with an anhydride, such as acetic anhydride, in the presence of an ITC salt or acid, to form a C-terminal peptidyl-TH via a C-terminal ITC intermediate (Stark, 1972). The peptidyl-TH can be cleaved to produce a shortened peptide and a C-terminal amino acid TH, which can be identified, e.g., by high pressure liquid chromatography (HPLC). The coupling conditions in this method typically require about 90 minutes at 60.degree.-70.degree. C. (Meuth), and often lead to degradation of some of the amino acid side chains in the peptide. Further, the anhydride reagent is relatively unstable, and therefore presents storage problems.
A C-terminal TH sequencing method which can be carried out under milder conditions has been described by the inventor and co-workers (Hawke). Using trimethylsilyl ITC (TMS-ITC) as the reagent, TH formation was achieved by activation of the peptide with acetic anhydride for 15 min at 50.degree. C., followed by reaction with TMS-ITC for an additional 30 min at 50.degree. C. The method suffers from the disadvantage, noted above, of peptide exposure to a highly reactive anhydride activating agent. In addition, and like the related TH-generating methods described above, the TH-amino acid reaction products are racemized, and thus the method cannot be used to distinguish D- and L-form amino acids.
The C-terminal sequencing methods involving TH formation just described have commonly lead to racemized products. A modification of the C-terminal reaction employing phosphoryl isothiocyanatidate reagent has been proposed (Kenner). Although TH was produced, the reaction was too slow to be very useful. Miller et al have proposed a related method, but using a mercaptobenzothiazole derivative. The rationale for using this compound is that cyclization could occur with concommitant opening of the thiazole ring.