Proteins are among the most abundant of organic molecules, often encompassing as much as 50 percent or more of a living organisms dry weight. Proteins perform many different functions within a living organism. For example, structural proteins are often woven together in long polymers of peptide chains to form fibrils, which are a major constituent of skin, tendon, ligaments, and cartilage. Proteins also have biological functions, including, for example, regulatory proteins such as insulin or growth hormones, protective proteins such as antibodies or complement, and transport proteins such as hemoglobin and myoglobin. Many proteins are present only in very minute quantities within living organisms, yet are nevertheless critical to the life of the organism. For example, loss of Factor VIII in humans leads to hemophilia, or the inability to properly clot blood.
Scientists have learned how to synthesize or express specific proteins in order to therapeutically replace those proteins in individuals who are deficient or lacking in the production of a particular protein. In order, however, to express these proteins from cells, or to artificially synthesize these proteins, it is first often necessary to determine the amino acid sequence of the protein.
Due in part to the great diversity of amino acids (there are at least 20 different types found in naturally occurring proteins), it has been very difficult to develop techniques suitable for sequencing proteins. This is partially due to the fact that some proteins may only be obtained in very small amounts. Thus, there has been a continuing need for improved sensitivity in determining the sequence of amino acids in a protein.
Various methods have been suggested for the sequencing of proteins. The first useful method for determining the amino-terminal (N-terminal) of proteins was developed by Sanger, who found that the free, unprotonated alpha-amino group of peptides reacts with 2,4-dinitrofluorobenzene (DNFB) to form yellow 2,4-dinitrophenyl derivatives (see Sanger and Tuppy, Biochem. J. 49:463-490, 1961, see also Sanger and Thompson, Biochem. J. 53:353-374, 1963). Later methods were developed utilizing 1-dimethylaminonaphthalene-5-sulfonyl chloride (dansyl chloride), (see Gray and Hartley, Biochem. J. 89:379-380, 1963) resulting in a 100-fold increase in sensitivity over Sanger's method. One difficulty with this method, however, is that it could only be performed once with the same sample of protein because the acid hydrolysis step destroys the protein, preventing analysis beyond the amino terminal amino acid of the protein.
In order to determine the identity of amino acids beyond the N-terminal amino acid residue, a widely used method for labelling N-terminal amino acids (see Edman, Acta Chem. Scand. 4:283, 1950) was applied to sequencing proteins. This method utilized phenylisothiocyanate to react with the free amino group of a protein, to yield the corresponding phenylthiocarbamoyl protein. Upon treatment with an anhydrous acid, the N-terminal amino acid is split off as a anilinothiazolinone amino acid, which is then converted to the corresponding phenylthiohydantoin (PTH) derivative. This PTH derivative may then be separated, and analyzed by, for example, liquid chromatography. Utilizing this method (Edman degradation), repetitive cycles could be performed on a given peptide allowing the determination of as many as 70 residues in an automated instrument called a sequenator (see Edman and Begg, Eur. J. Biochem. 1:80-91, 1967).
Currently, protein sequences are almost universally determined by Edman degradation utilizing the reagent phenylisothiocyanate. The efficiency and sensitivity of this process is, however, currently limited by the ability of UV absorption to detect PTHs. Presently, the most sensitive way to perform the Edman degradation is gas-liquid phase sequence analysis, where the polypeptides are non-covalently absorbed to a support in a sequenator cartridge. This sequencing method allows the analysis of protein and peptide sequences at the 10-20 picomole level. To reach that sensitivity level, the degradation chemistry must be tuned to an extent which does not allow for the recovery of PTH derivatives of post-translationally modified amino acids such as phosphate esters of serine, threonine, or tyrosine residues. Even in cases where the site of post-translational modifications can be determined, with very few exceptions, the nature of such modifications is generally not determinable. Current methods for determining the sites and nature of post-translational modification lag in sensitivity by approximately a factor of a thousand as compared to the capability of determining partial sequences. In addition, due to the complicated procedures for efficiently extracting contaminants and reaction by-products, the gas-liquid phase sequencing mode is prohibitively slow, requiring a cycle time of 45 to 60 minutes.
There is, therefore, a need in the art for improved methods of sequencing proteins or peptides which are present only in small quantities. The present invention provides such a method, in part through the repetitive sequencing of extremely small quantities of proteins or peptides (i.e., in the femtomole (10.sup.-15 moles) range), and further provides other related advantages.