1. Field of the Invention
The present invention relates to systems and methods for determining amino acid sequence of proteins or polypeptides.
2. Description of Related Art
Proteins are large organic molecules consisting of one or more polypeptide chains of amino acids. The backbone of polypeptide is linked by many peptide bonds which are formed between two adjacent amino acids by the dehydration of a carboxyl group of one amino acid and an amine group of the other amino acid. Polypeptides differ from one another primarily in their amino acid sequence. The peptide formed by two amino acids is called a “dipeptide,” the peptide formed by three amino acids is called a “tripeptide,” and so on.
Because the amino acid sequence determines the properties and biological functions of the proteins, it is important to find out the correct amino acid sequence of the protein [1]. In 1955, England biochemist Sanger had successfully determined the amino acid sequence of insulin and proved that the sequence is correct [2]. In addition, Perutz and Kendrew had determined the amino acid sequence of proteins by X-ray crystallography since 1958 [3-4].
Amino acids are the basic unit of proteins and are produced by fermentation, artificial synthesis, or hydrolysis of proteins. All amino acids hydrolyzed from natural proteins are α-amino acids, and typically the term “amino acids” used in biochemistry refers to α-amino acids while β-amino acids and γ-amino acids are used in the field of organic synthesis, petroleum chemical industry, and medical science. Table 1 lists 20 common amino acids found in natural proteins.
TABLE 1-log(sideDissociationDissociationchainconstantconstantdissociationMolecularIsoelectric(carboxyl(aminoconstant)NameAbbreviationSide chainweightpointgroup)group)(pKR)GlycineGGlyHydrophilic75.076.062.359.78AlanineAAlaHydrophobic89.096.112.359.87ValineVValHydrophobic117.1562.399.74LeucineLLeuHydrophobic131.176.012.339.74IsoleucineIIleHydrophobic131.176.05 2.329.76PhenylalanineFPheHydrophobic165.195.492.29.31TryptophanWTrpHydrophobic204.235.892.469.41TyrosineYTyrHydrophilic181.195.642.29.2110.46Aspartic acidDAspAcid133.12.851.999.93.9HistidineHHisAlkaline155.167.61.89.336.04AsparagineNAsnHydrophilic132.125.412.148.72Glutamic EGluAcid147.133.152.19.474.07acidLysineKLysAlkaline146.199.62.169.0610.54GlutamineQGlnHydrophilic146.155.652.179.13MethionineMMetHydrophobic149.215.742.139.28ArginineRArgAlkaline174.210.761.828.9912.48SerineSSerHydrophilic105.095.682.199.21ThreonineTThrHydrophilic119.125.62.099.1CysteineCCysHydrophilic121.165.051.9210.78.37ProlinePProHydrophobic115.136.31.9510.64
Except glycine, all α-amino acids have asymmetric carbon, and thus each of them has two enantiomers with opposite optical rotations, i.e., dextrorotatory (D) and levorotatory (L). Typically the proteins or polypeptides of organisms are constructed by levorotatory amino acids. However, exceptions may be found, for instance, tyrocidine and gramicidine also include dextrorotatory amino acids.
The hydrolysis of polypeptides may generate individual constituent amino acid residues and their enantiomers and various peptides of different lengths. Conventional high-performance liquid chromatography (HPLC) can be used for partial separation of a few hydrolytes [5-7], but fails to separate them all.
To determine the amino acid sequence, in 1984 Biemann et al. [8-9] use data from mass spectrometry to confirm the relationship between the amino acid sequence and nucleic acid sequence. In this work, proteins are hydrolyzed into peptide fragments by the mediation of trypsin, meanwhile high-performance liquid chromatography (HPLC) is used to separate peptide fragments and a fast atom bombardment-mass spectrometry (FAB-MS) is used to analyze the mass of the peptide fragments. The analysis data of FAB-MS is compared to all of the possible nucleic acid sequences, so as to confirm the relationship between the amino acid sequence and the nucleic acid sequence. At the same time, Edman develops an Edman sequencer [10-11] to determine amino acid sequence of proteins by hydrolyzing the polypeptide chain in order from N-terminal to C-terminal. Edman's method suffers from long analyzing time, poor sensitivity, and unable to separate amino acid enantiomers.