Protein-protein interactions play a key role in biology and related fields, from medicine to diagnostics to research. Specific molecules able to recognize a target protein can be obtained by established selection techniques (e.g. phage display, ribosome display, mRNA display, DNA display) applied to libraries of antibodies or alternative binding molecules. High specificity and selectivity are achieved by these strategies, but the binding mode of the molecules is often difficult to predict. A single epitope can be targeted in different ways, thus prediction of the residues involved in binding and of the interactions exploited is often not achievable, relegating the rational approach and selection process of these binders to a “case by case” methodology. Each new binding molecule will have to be individually created by selection procedure, it will bind a target protein in a different conformation each time, and its cross reactivity will have to be individually tested in each case. A much simpler case is provided when the target provides a continuous stretch of amino acids along the primary sequence (a linear epitope). The recognition of linear epitopes, in natural proteins, can be considered as a model for general recognition of peptides. With almost no secondary structure present and the possibility to be bound in extended conformation, all side chains of a peptide are potentially available for recognition at the amino acid level by a binding molecule in a precise and reproducible manner.
Among natural peptide binding proteins, including small adapter domains (e.g. SH2, SH3, PDZ, WW), MHCI and MHCII proteins, and several repeat proteins families (e.g. TPR, armadillo, WD40), armadillo repeat proteins possess characteristics making them suitable to build a scaffold to generate modular peptide binding proteins. The generation of such a scaffold based on armadillo repeat proteins would also take advantage of the methods developed for the consensus design of repeat proteins and the generation of repeat protein libraries described in WO 02/20565. Armadillo repeat proteins are abundant eukaryotic proteins involved in a broad range of functions (Coates, J. C., Trends Cell Biol 13:463-71, 2003), from transcription regulation (β-catenin) to cell adhesion (plakophilin), tumor suppressor activity (Adenomatous Polyposis Colon APC), and nucleo-cytoplasmic transport (importin-α). These proteins are characterized by tandem repeat units of approximately 42 amino acids that were first discovered in the product of Drosophila melanogaster segmentation polarity gene Armadillo. Armadillo repeat proteins participate in protein-protein interactions and the domain formed by the armadillo repeat units is usually involved in the recognition process. The armadillo repeat domain, originally defined by limited proteolysis, forms a right-handed superhelical structure, as shown initially by the crystal structures of β-catenin (Huber A. H. and Nelson W. J., Cell 90:871-82, 1997) and importin-α (Conti E. et al., Cell 94:193-204, 1998) (FIG. 1). Every repeat unit is composed by three α helices, named H1, H2, H3 (FIG. 2) and several repeat units stack to form a compact domain. Specialized repeat units are present at the N- and C-termini of these armadillo repeat domains, probably protecting the otherwise exposed hydrophobic core (FIG. 1).
Crystal structures of complexes of armadillo repeat proteins with their target proteins reveal that most of the targets are bound in an extended conformation inside a groove along the surface formed by the H3 helices (FIGS. 1 and 2). An asparagine residue, conserved in almost every repeat unit at the C-terminal part of H3, contacts the main chain of the target, while additional interactions to target side chains are provided by neighboring residues. A single repeat unit is, in general, responsible for the interaction with two target amino acid residues. As in the case of other repeat protein families which have been structurally characterized (e.g. TPR, D'Andrea L. D. and Regan D., Trends Biochem Sci 28:655-62, 2003; Ankyrin repeats, Mosavi L. K. et al., Protein Sci 13:1435-48, 2004; Leucine Rich Repeats, Kob B. and Kajava A. V., Curr Opin Struct Biol 11:725-32, 2001), the interactions are generally provided by residues on the surface of secondary structure elements (FIG. 2) instead of residues present in flexible loops like in the case of antibodies. The general principle to use repeat proteins as scaffold to generate binding molecules was described (Forrer et al., FEBS Lett. 539(1-3):2-6, 2003) and, in the case of designed ankyrin repeat proteins, high affinity binders were successfully selected from such a designed ankyrin repeat protein library (Binz et al., Nat Biotechnol. 22(5):575-82, 2004).
Armadillo repeat proteins, such as β-catenin and importin-α, are able to bind different types of peptides, relying on a constant way of binding of the peptide backbone without requiring specific conserved side chains or interactions with free N- or C-termini of a peptide (FIG. 3). The possibility of recognizing a peptide residue by residue, combined with the intrinsic modularity of a repeat protein, can make the armadillo repeat proteins promising candidates for the design of a generic scaffold for peptide binding. In addition, a remarkable KD as low as 10-20 nM has been reported for importin-α (Catimel B. et al., J Biol Chem 276: 34189-98, 2001) for binding of a target peptide, showing the possibility of achieving high peptide-binding affinities with designed armadillo repeat proteins.
Thus, the technical problem underlying the present invention is to identify novel approaches for the efficient generation of target-specific peptide-binding proteins based on armadillo repeat proteins. The solution to this technical problem is achieved by providing the embodiments characterized in the claims.