There exists a technological domain generally called evolutionary molecular engineering (or in vitro evolution) as methods for highly functionalizing biopolymers such as naturally occurring proteins or nucleic acids or methods for creating molecules having novel functions (Non Patent Literature 1). In recent years, the utilization of this technique has been expanded as basic technology for the development of biopharmaceutical products and diagnostic testing drugs.
For evolutionary molecular engineering targeting proteins and polypeptides, the constitution of an initial library is element technology crucial for success or failure of the creation of novel molecules (Non Patent Literatures 2 and 3). The forms of molecules constituting this initial library are broadly divided into two types. One of the two types is short-chain peptides of approximately 10 residues in length. In this case, almost the whole sequences of the molecules are randomized. Another type is relatively long-chain polypeptides with the backbone structure (fold) of a particular protein as a scaffold, and these molecules have a partially randomized sequence. Each of the short-chain peptide library and the protein backbone-type library has advantages and disadvantages as described below.
The short-chain peptide library is relatively easy to construct or screen. When peptides of 7 to 10 residues are randomized, the molecular diversity is theoretically on the scale of 207 to 2010, i.e., 109 to 1013 orders. Such a library size permits preparation and screening of the library using an existing technique. Various display techniques widely used in screening can also be easily applied to small molecular weights. On the other hand, the short-chain peptides are generally composed of highly flexible molecules and fail to stably form a particular three-dimensional structure in a solution. The short-chain peptides are therefore disadvantageous in that the specific binding between the peptide and a target receptor, the peptide and a target enzyme, etc., is low stable thermodynamically and high-affinity molecules or high-specificity molecules are difficult to obtain.
The protein backbone-type library employs the backbone structure of a particular natural protein (or artificial protein) as a scaffold. In many cases, the protein is selected from those having a known three-dimensional conformation. In the protein backbone-type library, not the whole molecule, but only a partial region is randomized. The other moieties maintain their particular sequences, which are often natural sequences. This is because the randomization of the whole region cannot be expected to form the inherent three-dimensional structure. For this purpose, an amino acid residue that contributes to the structure stabilization of the original protein is preserved with reference to conformation data or the like, while a loop region or the like positioned on the surface side of the molecule is often randomized. A plurality of loop regions may be randomized. In recent years, the backbone structure of an artificial protein consisting of an artificially designed sequence, rather than the natural protein, has sometimes been used as a scaffold.
The concept of the protein backbone-type library mimics the molecular structural patterns of antibodies (immunoglobulins). Specifically, the randomized moieties correspond to antibody variable regions, and the other moieties that maintain their natural sequences correspond to constant regions. As with antibodies, which recognize antigens via their variable regions, the protein backbone-type library is aimed at acquiring new functions via the randomized moieties.
Unlike the case of the short-chain peptide library, each randomized sequence introduced to the protein backbone is limited by possible conformations because both ends thereof are fixed to the robust backbone structure. The resulting library can be expected to circumvent the disadvantages attributed to the flexibility of the molecule. On the other hand, this library has no choice but to have a relatively enormous molecular size. In association with this, the degree of difficulty in research and development, production cost for practical use, reduced storage stability, etc., are pointed out as disadvantages. In addition, the limited conformations rather incur a potential risk for infeasible active structures.
Meanwhile, a molecular library based on a cyclic oligopeptide backbone is also known as a library of polypeptides having a small molecular weight and a stable structure. The cyclization of an oligopeptide, however, requires introduction of a functional group and complicated chemical reaction operation and complicates synthesis steps. Also, an oligopeptide cyclized through the oxidation reaction of cysteine is disadvantageous in that this oligopeptide is generally difficult to use in a reduced environment such as the inside of cells.
As mentioned above, the protein backbone-type library requires selecting a natural protein (or artificial protein) for use as a scaffold. Various proteins exceeding 40 types have been utilized so far (Non Patent Literatures 2 and 3). Table 1 shows main libraries, and some of them will be listed below as examples.
TABLE 1Table 1. Features of main protein backbone-type libraries(partial modification of excerpts from Non Patent Literature 2)Size of wholemolecule (theSize of randomizednumber ofregion (the numberName of protein backboneresidues)of residues)RemarksImmunoglobulin G120050-60Widely used as antibodydrugAntibody Fab fragment45050-60β-lactamase26512T-cell receptor2505 (changeable)Green fluorescent protein23818Antibody Fv fragment200-25050-60Ankyrin repeat67 + 33n7nCarbohydrate-binding module (CBM4-168122)Lipocalin160-18016Staphylococcal nuclease14916Ecotin14220Cytotoxic T-lymphocyte antigen 41366(CTLA-4)Thioredoxin10820Cytochrome b5621069Src homology domain 2 (SH2)1005Fibronectin type 39410 (changeable)Tendamistat746-8Having cyclic backboneMinibody6112Src homology domain 3 (SH3)6012Affibody5813Under development aspharmaceutical ordiagnostic drugBovine pancreatic trypsin inhibitor585Having cyclic backboneLipoprotein-associated coagulation589Having cyclic backboneinhibitorHuman pancreatic secretory trypsin568Having cyclic backboneinhibitorWW domain528Phage envelope protein pVIII506Human-derived trypsin inhibitor465Having cyclic backboneA-domain35n-40n30nHaving cyclic backboneCellulose-binding domain3611Having cyclic backboneInsect-derived defensin A peptide297Having cyclic backboneGourd trypsin inhibitor II286Having cyclic backboneZinc finger265Having cyclic backboneScorpion toxin25-404Having cyclic backboneCyclized peptide backbone124Having cyclic backbone
Affibody (Non Patent Literatures 4, 5, and 6 and Patent Literatures 1 and 2) having, as a protein backbone, protein Z modified from the antibody-binding domain of staphylococcal protein A (SPA) is a protein of 58 residues (6.5 kDa) that maintains high stability and solubility independently of intramolecular disulfide cross-link and permits large-scale production in a microbial expression system. In addition, its chemical production is also carried out by solid-phase synthesis (Non Patent Literature 7). A molecular library is prepared by rendering 13 residues on the helix variable. In this way, binding molecules have been obtained so far against dozens of types of target proteins. Affibody under most advanced research as a diagnostic reagent is high-affinity Affibody against a cell surface receptor HER-2, and this Affibody is applied as an imaging molecule for tumor diagnosis (Non Patent Literature 8).
Fibronectin type 3 domain is a small protein domain composed of β-sheet. Binding molecules against a plurality of targets such as ubiquitin have been obtained from a library in which the amino acid residues of two or three loop regions are randomized (Non Patent Literature 9 and Patent Literature 3).
Minibody is an artificial protein designed by the removal of three β-strands from the heavy chain variable domains of a monoclonal antibody (Non Patent Literature 10). This protein is 61 residues long and has two loops. These two loop regions are randomized. Although its low solubility (10 μM) has been perceived as a problem for practical use, variant-type Minibody that has attained high solubility (350 μM) as a result of mutagenesis has been reported (Non Patent Literature 11).
Tendamistat composed of 74 residues has six strands in β-sheet sandwich connected by two disulfide bonds (Non Patent Literature 12). This backbone contains three loops. Randomization has been attempted so far only for two of these loops.
Cytochrome b562 is a protein domain having a 4-helix bundle structure composed of 106 residues. A molecule binding with an equilibrium dissociation constant of 290 nM to low-molecular hapten has been obtained by the randomization of 9 amino acid residues in two loops (Non Patent Literature 13).
Oligonucleotide/oligosaccharide-binding fold (OB-fold) is a backbone structure constituted by five-stranded β-barrel capped by amphipathic α-helix (Patent Literature 4). The OB-fold is the 28th most common typical fold in the analysis of 20 or more genomic sequences (Non Patent Literature 14).
Cyclized β-turn peptide backbone is a low-molecular protein backbone having a stabilized conformation in a solution as a result of promoting the secondary structure formation of the peptide by disulfide-constrained cyclization (Patent Literature 5).
A protein backbone based on a coiled coil structure containing disulfide cross-link has been designed in order to stabilize the α-helix of a short-chain peptide. A coiled coil protein backbone containing an arginine-glycine-aspartic acid (RGD) sequence exhibits competitive inhibitory activity against fibrinogen (Non Patent Literature 15).
An artificial protein based on ankyrin repeat protein (designed AR protein, DARPin) is a giant protein having a repeat structure (Non Patent Literature 16). The repeat unit is a small domain of 33 residues and is composed of β-turn and antiparallel helix and loop without disulfide bonds.
A-domain (Non Patent Literature 17) is a backbone structure that is observed as a repeat unit. This structure is confirmed in cell surface receptors of various species and constituted by a linkage of domains each composed of 35 to 40 amino acid residues.
Cytotoxic T-lymphocyte antigen 4 (CTLA-4) is a helper T-cell surface receptor belonging to the immunoglobulin superfamily and acquires affinity for integrin by the introduction of a recognition sequence to hypervariable loop (Non Patent Literature 18).
Antibodies (immunoglobulins) are proteins that are used most widely as binding molecules having high specificity. Immunoglobulin G is a macromolecule having a molecular weight of approximately 150,000 and consisting of 12 subunits. An antigen-binding fragment (Fab) having a region containing an antigen-binding site by enzyme treatment, a variable region fragment (Fv) consisting of a heavy chain variable region (VH) and a light chain variable region (VL) prepared by a genetic engineering approach, a single-chain antibody (scFv) comprising VH and VL linked through a peptide linker, and the like are also frequently used as units of binding molecules (Non Patent Literature 19). A molecular library HuCAL has been reported in which the frameworks of an antibody variable region are used as a protein backbone of an artificial antibody independent of natural immune repertoire and complementarity-determining regions are randomized (Non Patent Literature 20).