Understanding and controlling protein stability has been a coveted endeavor to Biologists, Chemists, and Engineers. The first link between amino acid substitution and disease (Ingram. Nature. 1957, 180(4581):326-8) offered a new and essential perspective on protein stability in health and disease. The recent tremendous increase of protein-based pharmaceuticals has created a new challenge. Therapeutic proteins are stored in liquid for several months at very high concentrations. The percent of non-monomeric species increases with time. As aggregates form, not only the efficacy of the product decreases, but side effects such as immunological response upon administration may occur. Assuring stability of protein pharmaceuticals for the shelf-life of the product is imperative.
Because of their potential in the cure of various diseases, antibodies currently constitute the most rapidly growing class of human therapeutics (Carter. Nature Reviews Immunology. 2006, 6(5), 343). Since 2001, their market has been growing at an average yearly growth rate of 35%, the highest rate among all categories of biotech drugs (S. Aggarwal, Nature. BioTech. 2007, 25 (10) 1097).
Therapeutic antibodies are prepared and stored in aqueous solutions at high concentrations, as required for the disease treatment. However, these antibodies are thermodynamically unstable under these conditions and degrade due to aggregation. The aggregation in turn leads to a decrease in antibody activity making the drug ineffective and can even generate an immunological response. As such, there is an urgent need to develop a mechanistic understanding of how these antibodies, and indeed proteins in general, aggregate, to discover what regions of the protein are involved in the aggregation, and to develop strategies to hinder aggregation.
These effects are particularly important to antibody therapeutics. One approach to antibody stabilization is to graft the CDR loops that confer antigen binding specificity onto a more stable framework (Ewert, Honegger, and Pluckthun, Biochemistry. 2003, 42(6): 1517-28.). This approach will only work if the amino acid sequence in the CDR loops is not the driving aggregation force, and if grafting the CDR loops onto a more stable framework does not change the antigen binding specificity.
The technology related to predicting protein aggregation prone regions can be divided into two categories, 1) Phenomenological models and 2) Molecular simulation techniques. The phenomenological models are mainly based on predicting the aggregation ‘hot spots’ from protein primary sequences using properties such as hydrophobicity, β-sheet propensity etc, whereas the molecular simulation techniques use the three dimensional structure and dynamics of proteins to locate the regions prone to aggregation. Most of the techniques have been directed toward understanding amyloid fibril formation and aggregation of other small proteins where β-sheet formation is predominant.
Phenomenological models have been developed based on physicochemical properties such as hydrophobicity, β-sheet propensity etc., to predict the aggregation prone regions from protein primary sequence (Caflisch, Current Opinion in Chemical Biology. 2006, 10, 437-444; Chiti and Dobson. Annu. Rev. Biochem. 2006, 75: 333-366). One of the initial phenomenological models was based on mutational studies of the kinetics of aggregation of a small globular protein ‘Human muscle acylphosphatase (AcP) along with other unstructured peptides and natively unfolded proteins (Chiti, et al. Nature. 2003, 424 p. 805-808; U.S. Pat. No. 7,379,824]. This study revealed simple correlations between aggregation and physicochemical properties such as β-sheet propensity, hydrophobicity and charge. These studies were done under conditions at which the proteins are mainly unstructured. Thus a three parameter empirical model was developed that links sequence to the aggregation propensity (Chiti, et al. Nature. 2003, 424, 805-808). This model was also used to suggest variants of the 32-residue peptide hormone calcitonin to reduce its aggregation propensity (Fowler, et al. Proc Natl Acad Sci USA. 2005, 102, 10105-10110.). DuBay and coworkers have extended the three-parameter equation (Chiti, et al. Nature. 2003, 424, 805-808) into a seven-parameter formula that includes intrinsic properties of the polypeptide chain and extrinsic factors related to the environment such as peptide concentration, pH value and ionic strength of the solution) (Dubay, et al. J Mol Biol. 2004, 341, 1317-1326). Using this model they were able to reproduce the in vitro aggregation rates of a wide range of unstructured peptides and proteins. However, the main limitation of the seven-parameter model is that all residues in the sequence were given same relative importance. This is inconsistent with experimental and simulation observation which show that certain regions are more important than others, depending on their secondary structure propensities. Recently, this analysis was further extended to include protection factors to describe the aggregation of structured polypeptide chains (Tartaglia, G. G., Pawar, A. P., Campioni, S, Dobson, C. M., Chiti, F., and Vendruscolo, M. J Mol Biol (2008) in press). Some of the predicted sites were in agreement with the known aggregation prone sites for proteins such as Lysozyme, Myoglobin, etc. A phenomenological model without free parameters was developed (Tartaglia, et al. Protein Sci. 2004, 13, 1939-1941; Tartaglia et al. Protein Sci. 2005, 14, 2723-2734) to predict changes in elongation rate of the aggregate fibril upon mutation and identify aggregation prone segments. The physicochemical properties used are the change in β-propensity upon mutation, the change in number of aromatic residues, and the change in total charge. Furthermore, the ratio of accessible surface area is taken into account if the wild-type and mutant side chains are both polar or both apolar, whereas the dipole moment of the polar side chain is used in the case of apolar to polar (or polar to apolar) mutation. This model reproduced the relative aggregation propensity of a set of 26 heptapeptide sequences, which were predicted to favor an in-register parallel β-sheet arrangement.
The model of DuBay and coworkers (Dubay et al. J Mol Biol. 2004, 341, 1317-1326) has been modified with the inclusion of α-helical propensity and hydrophobic patterning, and comparing the aggregation propensity score of a given amino acid sequence with an average propensity calculated for a set of sequences of similar length (Pawar, et al., J Mol Biol. 2005, 350, 379-392). This model has been validated on the aggregation-prone segments of three natively unfolded polypeptide chains: Aβ42, asynuclein and the tau protein.
Another algorithm called TANGO (Fernandez-Escamilla, et al., Nat Biotechnol. 2004, 22, 1302-1306) was developed, which balances the same physico-chemical parameters, supplemented by the assumption that an amino acid is fully buried in the aggregated state. This is based on secondary structure propensity and estimation of desolvation penalty to predict β-aggregating regions of a protein sequence as well as mutational effects. In contrast to the models discussed earlier, TANGO takes into account the native state stability by using the FOLD-X force field. Although, it is not possible to calculate absolute rates of aggregation with TANGO, it provides a qualitative comparison between peptides or proteins differing significantly in sequence. Serrano and coworkers (Linding, et al., J Mol Biol. 2004, 342, 345-353) have used TANGO to analyze the β-aggregation propensity of a set of non-redundant globular proteins with an upper limit of 40% sequence identity.
A further algorithm, Prediction of Amyloid StrucTure Aggregation (PASTA), was recently introduced by editing a pair-wise energy function for residues facing one another within a β-sheet (Trovato, et al., Protein Engineering, Design & Selection. 2007, 20(10), 521-523; Trovato, et al., PLoS Comput. Biol. 2006, 2, 1608-1618; Trovato et al., J. Phys.: Condens. Matter. 2007 19, 285221). Yoon and Welsh (Yoon and Welsh, Protein Sci. 2004, 13: 2149-2160) have developed a structure-based approach for detecting β-aggregation propensity of a protein segment conditioned on the number of tertiary contacts. Using a sliding seven-residue window, segments with a strong β-sheet tendency in a tightly packed environment (i.e. with a high number of tertiary contacts) were suggested to be the local mediator of fibril formation.
While the phenomenological models described above were shown to perform well for small peptides and denatured proteins, aggregation propensities might differ for globular proteins such as antibodies where the tertiary structure and the stability of the native state are very important.
Molecular simulation techniques for predicting aggregation prone regions and studying the mechanism of aggregation have mostly employed simpler simulation models (Ma and Nussinov. Curr. Opin. Chem. Biol. 2006, 10, 445-452; Cellmer, et al., TRENDS in Biotechnology 2007, 25(6), 254). The least detailed of the simulation models employed was the lattice model, wherein each residue is represented as a bead occupying a single site on a three dimensional lattice. More detailed models, such as the intermediate resolution model followed but suffered from the same inability to accurately represent protein secondary and tertiary structures.
Unlike simpler models, atomistic models include all the atomistic details such as hydrogen bonding and are thus more accurate than the lattice or the intermediate resolution models. Such atomistic models have been used either with an explicit solvent, or with an implicit solvent where the solvent is treated as a continuum. The explicit model is more accurate but also more computationally demanding. Later a molecular dynamics simulation protocol was developed to obtain structural information on ordered β-aggregation of amyloidogenic polypeptides (Cecchini et al., J Mol Biol. 2006, 357, 1306-1321.). However, because such a procedure is very computationally demanding, especially for large proteins such as antibodies there does not appear to be full antibody atomistic simulation in the literature. Nevertheless, there have been atomistic simulations of small parts of the antibody, mostly for the Fab fragment (Noon, et al., PNAS. 2002, 99, 6466; Sinha and Smith-Gill, Cell Biochemistry and Biophysics. 2005, 43, 253).
Numerous existing approaches for preventing antibody aggregation employ the use of additives in protein formulations. This is different from the direct approach described herein where antibody itself is modified based on the aggregation prone regions predicted from molecular simulations. Additives commonly used in antibody stabilization are salts of nitrogen-containing bases, such as arginine, guanidine, or imidazole (EP0025275). Other suitable additives for stabilization are polyethers (EPA0018609), glycerin, albumin and dextran sulfate (U.S. Pat. No. 4,808,705), detergents and surfactants such as polysorbatebased surfactants (Publication DA2652636, and Publication GB2175906 (UK Pat. Appl. No. GB8514349)), chaperones such as GroEL (Mendoza, Biotechnol. Tech. 1991, (10) 535-540), citrate buffer (WO9322335) or chelating agents (WO9115509). Although these additives enable proteins to be stabilized to some degree in solution, they suffer from certain disadvantages such as the necessity of additional processing steps for additive removal. Thus, new methods are required to understand the mechanisms involved in protein aggregation and identify the protein regions which mediate this phenomenon. Such methods would be useful in a variety of diagnostic and therapeutic areas, and would allow protein compositions, such as antibody therapeutics, to be directly stabilized without the use of additives.