Current understanding of biology makes great use of atomic level protein structures, but the generation of these structures, e.g., by X-ray crystallography, is both expensive and uncertain. A significant bottleneck in the process is the generation of high quality crystals for X-ray diffraction. Much effort has gone to developing crystallization screens, and to creating high-throughput methods for cloning and expressing proteins (see, e.g., Acton T. B. et al., Methods Enzymol. 2005, 394, 210-243). However, the mechanisms of crystallization—and the protein characteristics that impact it—remain largely unknown and poorly understood, with different methods of study yielding substantially different results.
The Surface Entropy Reduction (SER) methods, identify mutations that can potentially improve crystallization by using secondary structure prediction and sequence conservation to locate residues with high-entropy side chains in variable loop regions of the protein. Replacing one or more of these residues with a low-entropy amino acid, like alanine, has been predicted to improve crystallization by reducing the entropic penalty of inter-protein interface formation. Moreover, this approach focuses on making mutations in predicted loop regions of the protein's secondary structure.
The methods described herein differ from the SER methods by using the Protein Data Bank (PDB) as a data mine of information to improve predictions. By using a topological analysis of crystal structures in the PDB, this is a novel approach to identifying possible mutations to improve crystallization. The methods described herein are superior as information is culled for improving interface formation from interfaces already experimentally observed. Moreover, unlike the SER methods, the methods and systems described herein use whole epitope modifications, rather than single amino acid changes, thus increasing the success rate at which an inter-protein interface could be formed, since interfaces are usually comprised of a surface and not a single residue interaction.
The epitope modifications involve chemical changes of very diverse types, including hydrophobic-to-hydrophilic substitutions in equal measure to hydrophilic-to-hydrophobic mutations, whereas the single-residue mutations suggested by SER involves primarily hydrophilic-to-hydrophobic substitutions and almost always polarity-reducing mutations. Such mutations tend to impair solubility, which prevents effective protein purification and crystallization. The greater diversity in the kinds of chemical changes involved in epitope modification fundamentally frees crystallization engineering from the crippling correlation between crystallization-improving and solubility-impairing mutations. Epitope modifications frequently involve increasing the side-chain entropy, so they do not require entropy reduction at the level of individual amino acids, which is the foundation of the SER method.
Finally, SER methods avoid mutations for non-loop regions of the protein, missing out on many potential epitopes in α-helices, helix capping motifs, or beta hairpins. The epitope engineering method described herein includes all secondary structure elements, thus generating a larger computational list of possible epitope candidates.