Under physiological conditions, proteins (polymer chains of peptide-linked amino acids) normally do not exist as extended linear polymer chains. A combination of molecular forces, including hydrogen bonding, hydrophilic and hydrophobic interactions, promote thermodynamically more stable secondary structures that can be highly organized (helices, beta pleated sheets, etc.). These structures can combine to form higher order structures with critical biological functions. Natural proteins are peptide-linked polymers containing 20 different amino acids, each with a different side-chain. The details of the folding into higher order structures are dependent on the type, frequency and primary sequence of the amino acids in the protein. Since each position in the polymer chain can be occupied by 20 different amino acids, the thermodynamic rules that describe the details of protein folding can be complex. For example, it is not yet possible to design a synthetic protein with a substrate-specific enzymatic site that is predicted by the primary amino acid sequence. More complete discussions of the structure and function of proteins are found in Dickerson et al. “The Structure and Action of Proteins” Harper and Row, New York, 1970 and Lehninger “Biochemistry” Worth, New York, 1970, pp. 109-146.
Some basic rules of protein folding have been discovered. In general, the side chains of the 20 L-amino acids commonly found in natural proteins can be placed in two categories: hydrophobic/non-polar and hydrophilic/polar, each playing separate roles in protein conformation. In the standard “oil drop” model for protein folding, the amino acids with more hydrophobic side chains (Val, Leu, Phe, Met, Ile) are sequestered to the inside of the protein structure, away from the aqueous environment. Frequently, these hydrophobic side chains form “pockets” that bind molecules of biological significance. On the other hand, hydrophilic amino acids (e.g. Lys, Arg, Asp, Glu) are most frequently distributed on the outer surface of natural proteins, providing overall protein solubility and establishing a superstructure for the internalized hydrophobic domains. Internal polar and ionizable groups are essential for enzymatic catalysis, proton transport, redox reactions, and many other functional properties of proteins. To engineer novel enzymes or to modify the function of existing ones, and to build switches that can be used to modify the stability of proteins in response to changes in pH, it is necessary to introduce polar or ionizable groups or to modify the properties of existing ones in the protein's interior region. Internal polar and ionizable amino acid groups however, usually destabilize proteins.
In computational biology, protein pKa calculations are used to estimate the pKa values of amino acids as they exist within proteins. These calculations complement the pKa values reported for amino acids in their free state, and are used frequently within the fields of molecular modeling, structural bioinformatics, and computational biology. pKa values of amino acid side chains play an important role in defining the pH-dependent characteristics of a protein. The pH-dependence of the activity displayed by enzymes and the pH-dependence of protein stability, for example, are properties that are determined by the pKa values of amino acid side chains. The pKa values of an amino acid side chain in solution is typically inferred from the pKa values of model compounds (i.e. compounds that are similar to the side chains of amino acids).
When a protein folds, the titratable amino acids in the protein are transferred from a solution-like environment to an environment determined by the 3-dimensional structure of the protein. For example, in an unfolded protein an aspartic acid typically is in an environment which exposes the titratable side chain to water. When the protein folds the aspartic acid may be buried deep in the protein interior with no exposure to solvent. In the folded protein the aspartic acid will be closer to other titratable groups in the protein and will also interact with permanent charges (e.g. ions) and dipoles in the protein. All of these effects alter the pKa value of the amino acid side chain, and pKa calculation methods generally calculate the effect of the protein environment on the model pKa value of an amino acid side chain. Typically the effects of the protein environment on the amino acid pKa value are divided into pH-independent effects and pH-dependent effects. The pH-independent effects (desolvation, interactions with permanent charges and dipoles) are added to the model pKa value to give the intrinsic pKa value. The pH-dependent effects cannot be added in the same straight-forward way and have to be accounted for using Boltzmann summation, Tanford-Roxby iterations or other methods.
The interplay of the intrinsic pKa values of a system with the electrostatic interaction energies between titratable groups can produce quite spectacular effects such as non-Henderson-Hasselbalch titration curves and even back-titration effects. pKaTool provides an easy interactive and instructive way of playing around with these effects. Several software packages and webserver are available for the calculation of protein pKa values. Some methods are based on solutions to the Poisson-Boltzmann equation (PBE), often referred to as FDPB-based methods (FDPB is for “finite difference Poisson-Boltzmann”). The PBE is a modification of Poisson's equation that incorporates a description of the effect of solvent ions on the electrostatic field around a molecule. The H++ web server, the pKD webserver, MCCE and Karlsberg+ use the FDPB method to compute pKa values of amino acid side chains. FDPB-based methods calculate the change in the pKa value of an amino acid side chain when that side chain is moved from a hypothetical fully solvated state to its position in the protein. To perform such a calculation, one needs theoretical methods that can calculate the effect of the protein interior on a pKa value, and knowledge of the pKa values of amino acid side chains in their fully solvated states. A set of empirical rules relating the protein structure to the pKa values of ionizable residues have been developed by Li, Robertson, and Jensen. These rules form the basis for the web-accessible program called PROPKA for rapid predictions of pKa values.
Molecular dynamics methods of calculating pKa values involve computationally measuring the free energy difference between the protonated and deprotonated forms of the molecule. This free energy difference is measured using methods such as free-energy perturbation, thermodynamic integration and the Bennett acceptance ratio. Molecular dynamics is typically a much more computationally expensive way to predict pKa's than using the Poisson-Boltzmann equation. Currently used molecular force fields do not take polarizability into account, which could be an important property for protonation energies.
The pH value where the titratable group is half-protonated is equal to the pKa if the titration curve follows the Henderson-Hasselbalch equation. Most pKa calculation methods silently assume that all titration curves are Henderson-Hasselbalch shaped, and pKa values in pKa calculation programs are therefore often determined in this way. Some software developed for protein pKa calculations include: AccelrysPKA Accelrys CHARMm based pKa calculation; H++ Poisson-Boltzmann based pKa calculations; MCCE Multi-Conformation Continuum Electrostatics; Karlsberg+pKa computation with multiple pH adapted conformations; pKD server pKa calculations and pKa value re-design; and PROPKA Empirical calculation of pKa values.