Despite the recent development of in vivo discovery platforms providing fully human antibodies (Green, 2014), recombinant antibody libraries continue to represent an important complementary approach, in particular for difficult targets where in vivo attempts have failed or are impossible to conduct due to the nature of the antigen. Recombinant antibody libraries have been described in a variety of layouts and formats (Mondon et al., 2008). Libraries are usually constructed in a combinatorial fashion randomly combining successively more complex variation within up to six loop regions, the complement determining regions (CDRs). The largest variation is generally introduced in the CDR3 region of the heavy chain (HC) variable domain (HC-CDR3), the most variable and important CDR segment present in natural antibodies (Tonegawa, 1983; Chothia et al., 1989).
For the HC-CDR3 regions the loop length distribution (percentage of each loop length present in the library) implemented in recombinant antibody libraries normally mirrors that observed in natural antibodies, that is a distribution showing an approximately bell-shaped distribution with a maximum around HC-CDR3 loops of length 12 (Zemlin et al., 2003). With few exceptions (for example Fellouse et al, 2007; Mahon et al., 2013), recombinant antibody libraries have been designed to follow (approximately) this bell-shaped distribution. This has important consequences when a library of high complexity (109 to 1010 total complexity or higher) is generated in a combinatorial fashion. Variants from shorter HC-CDR3 loops will be over-represented (practically all variants are present or even present several times) relative to variants from long HC-CDR3 loops because for the latter only a tiny fraction of all possible variants is present. Using a constant length distribution for the HC-CDR3 loop length (all HC-CDR3 lengths are present with an equal proportion in the library) further increases the redundancy for the shorter HC-CDR3 loops (their percent fraction is higher as compared to the bell-shaped distribution observed for natural antibodies), marginally increases the total coverage of possible variants for long HC-CDR3 loops and reduces the coverage for mid-range length HC-CDR3 loops.
The total number of antibodies already approved as therapeutic agents or in clinical development is steadily increasing. A survey of the ChEMBL database (www.ebi.ac.uk/chembl/) shows that their HC-CDR3 length distribution has a pronounced maximum at HC-CDR3 length 10, different from the smooth bell-shaped length distribution observed for natural human antibodies but also that from mouse antibodies. HC-CDR3 loops of length 10 should therefore be represented particularly well in a library aimed at isolation of candidates for therapeutic antibodies. It is likely that antibodies with shorter HC-CDR3 loops express well and show lower tendency for aggregation, important characteristics for a successful product development.
Although HC-CDR3-only libraries have been generated in various contexts (Barbas et al., 1992; Braunagel et al., 1997; Pini et al., 1998; Hoet et al., 2005; Silacci et al., 2005; Mahon et al., 2013; US 2006/0257937A1) many recombinant antibody libraries introduce diversity not only in the HC-CDR3 region but also in one or more of the five other CDR regions (for example Knappik et al., 2000; Prassler et al., 2013) The diversity present in the various CDR regions is then combined, in a completely random fashion, during library cloning usually starting with the CDR region with the lowest overall diversity. With the exception of short HC-CDR3 loops, where some redundancy can exist and duplicates might be present, each HC-CDR3 region variant has to be considered unique being present only once in the library. As a consequence, each HC-CDR3 loop variant becomes “associated” with a completely random combination of variants from the other CDR regions, without any structural or functional selection for compatibility. Compared to a situation where the other CDR regions are represented by germline sequences or by single consensus sequences (for example for the light chain CDR3 region), there is no advantage having a particular HC-CDR3 variant combined with a random selection of variants from the other (one to five) CDR regions. A HC-CDR3-only library should therefore perform as good or even better compared to a library with additional diversity. The only exception is short HC-CDR3 loops where, due to the redundancy (presence of variants in more than one copy in the library), a very limited number of combinations of variation in the other CDRs can be explored, i.e. the same HC-CDR3 variant would be present multiple times, each time with a different combination of variants in the other CDR regions. However, in order for the HC-CDR3 variant to be combined with only 10 of these combinations, the duplication level of the HC-CDR3 region must also be around 10. Even for short HC-CDR3 loops this would imply to increase the fraction of that loop length in the library by a factor of 10, being impractical for most HC-CDR3 loop lengths. For example, a HC-CDR3 loop with a particular length that represents a few percent of the total library would need to be present at a relatively high double-digit percent fraction in order to effectively explore additional diversity present in the library, for example in LC-CDR3. While this is already difficult to achieve for a single HC-CDR3 loop length, it is impossible to generate a library where variants from all HC-CDR3 loop lengths effectively combine with even a limited number of variants in another CDR region. In one case (Mahon et al., 2013) the performance a HC-CDR3-only library was compared to a corresponding HC-CDR3-and-LC-CDR3 library. The HC-CDR3-only library showed superior properties; however the authors did not fully appreciate the “combinatorial effect” that favors a HC-CDR3-only library but attributed the better performance of the HC-CDR3 library to possible structural incompatibilities between the LC-CDR3 and HC-CDR3 diversity in the HC-CDR3-and-LC-CDR3 library.
Recombinant antibody libraries where the design of the HC-CDR3 diversity is based on the position-wise amino acid frequencies observed in natural antibodies have been generated using either standard degenerated oligonucleotides (e.g. Philibert et al., 2007), allowing only an approximate representation of the desired amino acid distribution and generating undesired Cys and stop codons, or by oligonucleotides where diversity has been introduced through mixtures of trimer-blocks encoding amino acids (Braunagel et al., 1997; Knappik et al., 2000; Prassler et al, 2013, Mahon et al, 2013; patent applications US 2006/0257937A1, EP1979378B1).
However, none of these examples appreciates the combinatorial effect that relates to the number of different variants that are actually present in the library for a particular HC-CDR3 length representing a certain fraction of the total library compared to the theoretically possible number of variants as defined by the library design. In the presence of a bell-shaped “natural-like” HC-CDR3 loop length distribution, the combinatorial effect leads to an over-representation of variants for short HC-CDR3 loops and a very small coverage for longer HC-CDR3 loops. US 2006/0257937A1 only describes library designs that cover a restricted range of HC-CDR3 loop lengths (8, 10, 13, 14, 15, 17, 18, 19) and the amino acid composition at the HC-CDR3 loop positions either corresponds to a fixed equimolar mixture of 19 different amino acids or is restricted to a fixed mixture of few amino acids for a particular position, indiscriminately for all HC-CDR3 loop lengths. EP1979378B1 describes a library design where the HC-CDR3 loop lengths are divided into three varying length ranges, each range having a defined amino acid composition (called diversity factor). The diversity factor representing the amino acid composition of all HC-CDR3 loops within a certain length range for the various HC-CDR3 loop positions comprises Kabat positions 95 to 102. For each position or range of positions within HC-CDR3 the diversity factor assigns particular frequencies for a subset of amino acids, while all of the remaining amino acids (except Cys) are included at a fixed frequency, with the exception of positions 101 and 102 where only a subset of amino acids is present. The design therefore generates an enormous number of theoretically possible variants since all amino acids (except Cys) are present, with varying frequencies, at nearly all HC-CDR3 loop positions and for all HC-CDR3 loop lengths. Even for mid-range length HC-CDR3 loops (for example lengths 9, 10, 11) the actual number of variants present in a library of total complexity 1010 represents only a fraction of all possible variants according to the design.
Recombinant human antibody libraries incorporating synthetic CDR3 diversity up to a total overall complexity of about 1012 have been generated (Knappik et al. 2000, Prassler et al., 2011) and have proven successful (selection of antibodies against a particular target) in practical applications, possibly also because of their sheer size. However the generation of libraries of such a size requires a very significant effort and has also a high economic cost.
There is, therefore, the need to design human antibody libraries with optimized properties, i.e. a high probability for selecting good candidate clones for further development into a therapeutic antibody, that can be generated with an acceptable experimental effort and at an acceptable economic cost.