An antibody is composed of four polypeptides: two heavy chains and two light chains. The antigen binding portion of an antibody is formed by the light chain variable domain (VL) and the heavy chain variable domain (VH). At one extremity of these domains six loops form the antigen binding site and also referred to as the complementarity determining regions (CDR). Three CDRs are located on the VH domain (H1, H2 and H3) and the three others are on the VL domain (L1, L2 and L3). During B cell development a unique immunoglobulin region is formed by somatic recombination known as V(D)J recombination. The variable region of the immunoglobulin heavy or light chain is encoded by different gene segments. The heavy chain is encoded by three segments called variable (V), diversity (D) and joining (J) segments whereas the light chain variable is formed by the recombination of only two segments V and J. A large number of antibody paratopes can be generated by recombination between one of the multiple copies of the V, D and J segments that are present in the genome. The V segment encodes the CDR1 and CDR2 whereas the CDR3 is generated by the recombination events. During the course of the immune response further diversity is introduced into the antigen binding site by a process called somatic hypermutation (SHM). During this process point mutations are introduced in the variable genes of the heavy and light chains and in particular into the regions encoding the CDRs. This additional variability allows for the selection and expansion of B cells expressing antibody variants with improved affinity for their cognate antigen.
In recent years several display technologies have emerged and allow for the screening of large collections of proteins or peptides. These include phage display, bacterial display, yeast display and ribosome display (Smith G P. Science. 1985 Jun. 14; 228(4705):1315-7; Hanes J and Plückthun A. Proc Natl Acad Sci USA. 1997 May 13; 94(10):4937-42; Daugherty P S et al., Protein Eng. 1998 September; 11(9):825-32; Boder E T and Wittrup K D. Nat. Biotechnol. 1997 June; 15(6):553-7). In particular these methods have been applied extensively to antibodies and fragments thereof. A number of methods have been described to generate libraries of polypeptides and to screen for members with desired binding properties.
A first approach is to capture by gene amplification rearranged immunoglobulin genes from natural repertoires using either tissues or cells from humans or other mammals as a source of genetic diversity. These collections of rearranged heavy and light chains (VH and VL) are then combined to generate libraries of binding pairs that can be displayed on bacteriophage or on other display packages such as bacteria, yeast or mammalian cells. In this case a large fraction of the immunoglobulin repertoire found in the donor is captured. Thus all of the frameworks encoded by the donor germline genes can be found in such repertoires as well as diversity generated both by V(D)J recombination and by somatic hypermutation (Marks J D et al., J Mol. Biol. 1991 Dec. 5; 222(3):581-97; McCaffety U.S. Pat. No. 5,969,108).
A limitation of natural repertoires is that naturally occurring antibodies can be based on frameworks with low intrinsic stability that limit their expression levels, shelf life and their usefulness as reagents or therapeutic molecules. In order to overcome these limitations a number of methods have been developed to generate synthetic antibody libraries. In these approaches, a unique or a limited number of selected antibody framework encoded by their corresponding germline genes are selected. The selection of these frameworks is commonly based on their biochemical stability and/or their frequency of expression in natural antibody repertoires. In order to generate a collection of binding proteins, synthetic diversity is then introduced in all or a subset of CDRs. Typically either the whole or part of the CDR is diversified using different strategies. In some cases diversity was introduced at selected positions within the CDRs (Knappik A et al., J Mol. Biol. 2000 Feb. 11; 296(1):57-86). Targeted residues can be those frequently involved in antigen contact, those displaying maximal diversity in natural antibody repertoires or even residues that would be preferentially targeted by the cellular machinery involved in generating somatic hypermutations during the natural affinity maturation process (Balint R F, Larrick J W. Gene. 1993 Dec. 27; 137(1):109-18.).
Several methods have been used to diversify the antibody CDRs. Overlapping PCR using degenerate oligonucleotides have been extensively used to assemble framework and CDR elements to reconstitute antibody genes. In another approach, unique restriction enzyme sites have been engineered into the framework regions at the boundary of each CDR allowing for the introduction of diversified CDRs by restriction enzyme mediated cloning. In any case, as all the members of the library are based on frameworks with selected and preferred characteristics, it is anticipated that the antibodies derived from these repertoires are more stable and provide a better source of useful reagents. (Knappik, U.S. Pat. No. 6,696,248; Sidhu S S, et al., Methods Enzymol. 2000; 328:333-63; Lee C V et al., J Mol. Biol. 2004 Jul. 23; 340(5):1073-93).
However, an important limitation of these synthetic libraries is that a significant proportion of the library members are not expressed because the randomly diversified sequences do not allow for proper expression and/or folding of the protein. This problem is particularly significant for the CDR3 of the heavy chain. Indeed, this CDR often contributes to most of the binding energy to the antigen and is highly diverse in length and sequence. While the other CDR (H1, H2, L1, L2 and L3) can only adopt a limited number of three dimensional conformations, known as canonical folds, the number of conformations that can be adopted by the heavy chain CDR3 remains too diverse to be predicted (Al-Lazikani B et al., J Mol. Biol. 1997 Nov. 7; 273(4):927-48). In addition, the use of long degenerate oligonucleotides used to cover long CDR H3 often introduces single base-pair deletions. These factors significantly reduce the functional size of synthetic repertoires.
Both natural and synthetic repertoires have advantages and limitations. On one hand, strategies relying on the capture of naturally rearranged antibody variable genes are not optimal as they include potentially less favorable frameworks within the library. A positive aspect is that these rearranged variable genes include CDRs which are compatible with proper domain folding as they have been expressed in context of a natural antibody. On the other hand, strategies based on selecting frameworks and inserting synthetic diversity benefit from the improved stability of the frameworks but are limited by the large number of CDR sequences that are not compatible with folding and/or expression and can destabilize the overall domain (FIG. 1A). There is therefore a need for novel approaches that could combine the benefits of using selected frameworks with desirable characteristics and combine them with properly folded CDRs for instance derived from natural repertoires.
All described approaches to generate antibody libraries either by capturing naturally rearranged antibody sequences or by generating diversity by synthetic means are limited by the occurrence of frame shift mutations leading to non-functional antibody sequences. These mutations can appear at multiple steps of the molecular handling of the DNA encoding the antibodies such as PCR amplification and DNA fragment assembly as well as molecular cloning. The frequency of non-functional members in antibody libraries typically ranges from 15% to 45% depending of the strategies employed to capture or generate the antibody diversity (Persson M A et al., Proc Natl Acad Sci USA. 1991 Mar. 15; 88(6):2432-6; Schoonbroodt S, et al., Nucleic Acids Res. 2005 May 19; 33(9):e81; Söderling E et al., Nat Biotechnol. 2000 August; 18(8):852-6; Rothe et al., J Mol Biol. 2008 Feb. 29; 376(4):1182-200). The frequency of sequences encoding non functional antibodies has a major impact on the antibody identification process. First, the functional size of the library is reduced and, because non-functional clones often have a growth advantage during the propagation of the libraries, they expand faster and can compromise the identification process of antibody candidates (De Bruin R et al., Nat Biotechnol 1999 Apr. 17: 397-399). These issues are recognized as serious limitations for fully exploiting the potential of antibody libraries. The generation of highly functional libraries remains a challenge in the field and has prompted many efforts to improve the process. For instance, multiple diversification strategies aiming at mimicking the amino acids usage found in natural CDR sequences have been used in order to more effectively sample the huge diversity of possible sequence combination encoded by synthetic CDRs (de Kruif J et al., J Mol Biol. 1995 Apr. 21; 248(1):97-105; Sidhu S S et al., J Mol Biol. 2004 Apr. 23; 338(2):299-310). Another approach is to clean up the initial library in order to remove nonfunctional clones at the potential expense of diversity loss. This has been applied to the pre-selection of synthetic repertoires by binding the antibody library to a generic ligand. This step allowed for the enrichment of library members that are able to express and to fold properly and can be used to recreate a more functional library (Winter and Tomlinson, U.S. Pat. No. 6,696,245 B2). Regardless of the approach the quality of any library is dependent on the efficiency of the molecular biology methods applied to generate the library and generally lead to 15% to 45% non-functional members of the library. There is therefore a need for novel and highly efficient approaches that minimize the frequency on non-functional genes due to frame shifts introduced during the molecular cloning steps and that maximize the functionality of libraries by capturing CDR regions having a high propensity of being correctly folded into antibody frameworks with desirable properties. Furthermore, there is a need for approaches that allow the capture of the CDR sequences from an animal immune repertoire into a therapeutically useful context such as human antibody frameworks in order to improve the generation process of high affinity antibodies.