It is now common practice in the art to prepare libraries of genetic packages that individually display, display and express, or comprise a member of a diverse family of peptides, polypeptides or proteins and collectively display, display and express, or comprise at least a portion of the amino acid diversity of the family. In many common libraries, the peptides, polypeptides or proteins are related to antibodies (e.g., single chain Fv (scFv), Fv, Fab, whole antibodies or minibodies (i.e., dimers that consist of VH linked to VL)). Often, they comprise one or more of the CDRs and framework regions of the heavy and light chains of human antibodies.
Peptide, polypeptide or protein libraries have been produced in several ways in the prior art. See e.g., Knappik et al., J. Mol. Biol., 296, pp. 57-86 (20004, which is incorporated herein by references. One method is to capture the diversity of native donors, either naive or immunized. Another way is to generate libraries having synthetic diversity. A third method is combination of the first two. Typically, the diversity produced by these methods is limited to sequence diversity, i.e., each member of the library differs from the other members of the family by having different amino acids or variegation at a given position in the peptide, polypeptide or protein chain. Naturally diverse peptides, polypeptides or proteins, however, are not limited to diversity only in their amino acid sequences. For example, human antibodies are not limited to sequence diversity in their amino acids, they are also diverse in the lengths of their amino acid chains.
For antibodies, diversity in length occurs, for example, during variable region rearrangements. See e.g., Corbett et al., J. Mol. Biol., 270, pp. 587-97 (1997). The joining of V genes to J genes, for example, results in the inclusion of a recognizable D segment in CDR3 in about half of the heavy chain antibody sequences, thus creating regions encoding varying lengths of amino-acids. The following also may occur during joining of antibody gene segments: (i) the end of the V gene may have zero to several base deleted or changed; (ii) the end of the D segment may have zero to many bases removed or changed; (iii) a number of random bases may be inserted between V and D or between D and J; and (iv) the 5′ end of J may be edited to remove or to change several bases. These rearrangements result in antibodies that are diverse both in amino acid sequence and in length.
Libraries that contain only amino acid sequence diversity are, thus disadvantaged in that they do not reflect the natural diversity of the peptide, polypeptide or protein that the library is intended to mimic. Further, diversity in length may be important to the ultimate functioning of the protein, peptide or polypeptide. For example, with regard to a library comprising antibody regions, many of the peptides, polypeptides, proteins displayed, displayed and expressed, or comprised by the genetic packages of the library may not fold properly or their binding to an antigen may be disadvantaged, if diversity both in sequence and length are not represented in the library.
An additional disadvantage of prior art libraries of genetic packages that display, display and express, or comprise peptides, polypeptides and proteins is that they are not focused on those members that are based on natural occurring diversity and thus on members that are most likely to be functional. Rather, the prior art libraries, typically, attempt to include as much diversity or variegation at every amino acid residue as possible. This makes library construction time-consuming and less efficient than possible. The large number of members that are produced by trying to capture complete diversity also makes screening more cumbersome than it needs to be This is particularly true given that many members of the library will not be functional.