The present invention, in some embodiments thereof, relates to computational chemistry and computational protein design and, more particularly, but not exclusively, to a method of computationally constructing a library of amino-acid sequences having a common structural fold; and a method of designing and selecting an amino-acid sequence having a desired affinity to a molecular surface of interest of a molecular entity. These methods can be used, for example, for designing binding proteins having structural stability and high binding affinity towards predetermined molecular targets. The present invention, in some embodiments thereof, further relates to computational chemistry and, more particularly, but not exclusively, to a method of producing an amino-acid sequence having a desired affinity to a molecular surface of interest and to an amino acid sequence having a desired affinity to a molecular surface of interest.
Molecular recognition underlies many central biological processes, hence, an ability to design novel protein interactions holds great promise for creating highly specific and potent molecules for use in the chemical industry as well as therapeutics, diagnostics, and research probes. Recent strategies in protein binder design have used naturally occurring proteins as scaffolds onto which binding surfaces were designed, while relying either on a single protein scaffold, or several hundred different scaffolds to achieve the structural characteristics required for binding. In all cases the designed scaffolds were treated as rigid structural elements with minimal perturbation of their backbone degrees of freedom. Some of these strategies resulted in the experimentally validated design of homooligomers, inhibitors, and protein affinity purification reagents.
A computational exercise in designing a de novo protein having a sequence of, for example, some 220 naturally occurring amino acids (roughly equivalent to an antibody Fv domain), would require 20220=10286 unique amino acid permutations to define the amino acid sequence of one protein; a feat current computation machines are yet unequipped to handle. Current computational design methodologies use naturally occurring rigid scaffolds to design de novo molecular function; however, these approaches are fundamentally limited by the number of suitable scaffolds of known three-dimensional structures. In addition, several general limitations have been made about successfully designed binding surfaces according to the abovementioned strategies:
1. They comprise surfaces rich in secondary-structure content (α-helices and β-sheets);
2. Interactions with the target are largely mediated by hydrophobic amino acid side-chains; and
3. The buried surface area upon binding is at or smaller than the average for naturally occurring protein-protein interactions, estimated at 1600 Å2. The design of large and polar surfaces, essential to make computational binder design general, remains an unmet challenge.
Some common protein folds, which have been identified in many proteins, some of which are seemingly unrelated to each other in terms of genealogy, organism and function, which are also known as conserved domain folds, offer a unique opportunity to study the fundamentals of sequence-structure-function relationships, albeit several observations of proteins sharing a common fold but serving an unrelated function still challenges modern science. Nonetheless, some studies have attempted to harness the common structural fold to assist in computational protein design.
One of the most fascinating conserved protein folds is known as the TIM-barrel, or α/β protein fold. Observations of this fold, which is shared by many proteins and many organisms, have assisted in the development of the convergent evolution theory pertaining to similar features in species of different lineages. Likewise, TIM-barrels have been contemplated as suitable scaffolds for de novo protein design.
Offredi, F. et al. [J Mol Biol., 2003, 325(1), p. 163-74], used structural data from crystal structures of TIM-barrel fold protein to define geometrical rules of an “ideal” fold having a 4-fold symmetry, and following definition of the backbone geometry, attempted a sequence search to find the sequence that would stabilize the conformation.
Figueroa, M. et al. [PLoS One, 2013, 8(8), p. e71858] used the Rosetta suite to design a TIM-barrel protein using a model known as “Octarellin V” as the starting backbone model, and constructed the loop regions from six-residue fragments of PDB proteins displaying a select secondary structure pattern using the Rosetta loop-building protocol. Final structures were evaluated based on hydrogen bonding between β-strands, packing at the β-strand-α-helix interface and Rosetta all-atom energy function.
One of the most studied families of proteins in the context of protein binding interactions, structure prediction and molecular design is the family of antibodies. Antibodies comprise two types of polypeptide, referred to as the light chain and the heavy chain. The light chain and heavy chain are composed of distinct domains with similar structures, the light chain comprising two such domains, and the heavy chain comprising four such domains. Each domain comprises a “sandwich” characterized by two β-sheets composed of anti-parallel β-strands, with a disulfide bond linking the two β-sheets. The domain at the N-terminal end of each of the heavy chain and the light chain is variable in amino acid sequence. These “variable domains” provide the wide diversity of different antibodies. The other domains compose the “constant region” of the heavy and light chains.
An antigen-binding region of an antibody is formed from one light chain variable domain in combination with one heavy chain variable domain. In a variable domain, variability in amino acid sequence is restricted primarily to 3 “complementarity-determining regions (CDRs)” (also known as “hypervariable regions”, and individually termed CDR1, CDR2 and CDR3), separated by relatively conserved “framework regions”. Thus, an antigen-binding region contains three light chain CDRs (termed L1, L2 and L3) and three heavy chain CDRs (termed H1, H2 and H3). The three CDRs in each domain are clustered at the target binding surface of the antibody, each CDR being associated with a loop linking two β-strands. The conserved framework regions form a rigid structure characterized by structural homology, which provides the antibody with stability and affects the CDRs conformational rigidity.
Much of the variability in CDRs is a result of V(D)J (Variable, Diverse, and Joining gene segments) recombination, wherein an immune cell genome undergoes recombination such that one of about 44 V gene segments is randomly combined with one of 6 J gene segments. In addition, in the heavy chain gene, one of 27 D gene segments is located between the selected V and J gene segments. The V gene segment is the largest, coding for CDR1 and CDR2, as well as for a portion of the CDR3, whereas the D and J gene fragments code for portions of CDR3 (L3 or H3 in the case of the J fragment, H3 in the case of the D fragment). V(D)J recombination allows for a wide variety of light chain and heavy chain sequences. Additional variability results from combinations of different heavy and light chains, and from processes, which result in addition and/or deletion of nucleotides or other mutations in the light chain and heavy chain genes.
Despite their tremendous diversity, the CDRs (with the exception of the H3 CDR) fall into a handful of discrete conformations termed “canonical conformations”. For example, in hundreds of antibody molecular structures, only seven conformational variants are observed for the L2 CDR. The canonical conformations are characterized by key conserved residue identities that maintain the backbone conformation.
The key challenge in the design of backbone fragments for function is that the designed surface needs both to function (bind its target) and to be conformationally stable. As mentioned above, antibodies are constructed of sequence blocks that alternate highly conserved with highly variable segments, and the molecular structures of antibodies show that the conserved segments belong to a structurally homologous and rigid structure known as the framework, which provides the necessary stability to the antibody, whereas the variable segments cluster at the target binding surface, and were therefore termed the complementarity-determining regions (CDRs).
A key attraction for antibody engineering lies in antibodies' modular architecture, suggesting that a large combinatorial complexity of well-folded backbones could be tapped. As early as the 1980s, observations on the structural modularity of antibodies proposed that synthetic antibodies could be constructed by combining fragments of naturally occurring antibodies. From this insight, investigators have devised a method for antibody humanization, in which CDRs from a mouse antibody were grafted onto a human antibody framework to generate a humanized functional antibody, opening the way to safe therapeutic antibody engineering. These early advances raised excitement that the complete design of antibodies from first principles is achievable, but until recently, computational tools for protein design had not matured sufficiently to realize this objective.
Recent work on computational antibody design aimed at increasing binding affinity [Clark, L A. et al., Protein Sci., 2006, 15(5), p. 949-60; Lippow, S M. et al., Nat. Biotechnol., 2007, 25(10), p. 1171-6; Clark, L A. et al., Protein Eng Des Sel., 2009, 22(2), p. 93-101], identify favorable positions for experimental random mutagenesis [Barderas, R. et al., Proc. Natl. Acad. Sci. USA, 2008, 105(26), p. 9029-34], modify binding specificity [Farady, C. J. et al., Bioorg. Med. Chem. Lett., 2009, 19(14), p. 3744-7] and increase thermo-resistance [Miklos, A. E. et al., Chem. Biol., 2012, 19(4), p. 449-55].
A de novo antibody design strategy was suggested by Pantazes et al. that capitalizes on observations that antibody CDRs exhibit canonical conformations.
Pantazes and Maranas [Protein. Eng. Des. Sel., 2010, 23, 849-858] describe a general computational method (“OptCDR”) for designing binding portions of antibodies by first determining which combinations of canonical structures are most likely to favorably bind a selected antigen, and then performing simultaneous refinement of the CDR backbones and optimal amino acid selection for each position.
Pantazes and Maranas [BMC Bioinformatics 2013, 14:168] also describe a method of predicting antibody structure by using experimentally determined antibody structures to compile a database of 929 modular antibody parts (MAPs), which can be combined to create 2.3·1010 unique antibodies. The MAPS are described as being analogous to V, D and J gene fragments.
Weitzner B. D. et al. [Proteins, epub. Feb. 12, 2014], teach initial model is constructed by grafting the individual antibody CDRs onto a chain specific framework, wherein the H3 is modeled de novo while sampling rigid body orientation using the Rosetta docking algorithm.
Shirai H. et al. [Proteins, epub. Apr. 22, 2014], teach identifying an antibody Fv domain framework template based on the H3 subtype, the construction of a database of conformations of all canonical loops including H3 associated with position-specific scoring matrices (PSSMs), choosing the most appropriate cluster for a given sequence based on it PSSM score, and subsequently constructing models minimized with harmonic backbone constraints to the template model.
However, even if a method that can theoretically encompass and effectively and systematically sample all the conformational combinatorial space that can be generated by permutations at the gene level, such method would not be able to account for the myriad of random mutations which are observed in naturally occurring antibodies.
Additional background art include U.S. Patent Application Nos. 20030059827, 20110224100, 20130244940, 20130296221 and 20140005125, Smadbeck, J., Peterson, M. B., Khoury, G. A., Taylor, M. S., Floudas, C. A. “Protein WISDOM: A Workbench for In silico De novo Design of BioMolecules”, J. Vis. Exp., (77), e50476, and the review “Protein folding and de novo protein design for biotechnological applications” by Khoury, G. A., Smadbeck, J Kieslich, C. A., and Floudas, C. A., Trends in Biotechnology, 2014, 32(2), p. 99-109, which is incorporated by reference in its entirety as if fully set forth herein.