1. Field of the Invention
The present invention relates generally to a computer-aided design of a protein with binding affinity to a target molecule and, more particularly, relates to methods for screening and identifying antibodies (or immunoglobulins) with diverse sequences and high affinity to a target antigen by combining computational prediction and experimental screening of a biased library of antibodies.
2. Description of Related Art
Antibodies are made by vertebrates in response to various internal and external stimuli (antigens). Synthesized exclusively by the B cells, antibodies are produced in millions of forms, each with a different amino acid sequence and a different binding site for an antigen. Collectively called immunoglobulins (abbreviated as Ig), they are among the most abundant protein components in the blood, constituting about 20% of the total plasma protein by weight.
A naturally occurring antibody molecule consists of two identical “light” (L) protein chains and two identical “heavy” (H) protein chains, all held together by both hydrogen bonding and precisely located disulfide linkages. Chothia et al. (1985) J. Mol. Biol. 186:651–663; and Novotny and Haber (1985) Proc. Natl. Acad. Sci. USA 82:4592–4596. The N-terminal domains of the L and H chains together form the antigen recognition site of each antibody.
The mammalian immune system has evolved unique genetic mechanisms that enable it to generate an almost unlimited number of different light and heavy chains in a remarkably economical way by joining separate gene segments together before they are transcribed. For each type of Ig chain—κ light chains, λ light chains, and heavy chain—there is a separate pool of gene segments from which a single peptide chain is eventually synthesized. Each pool is on a different chromosome and usually contains a large number of gene segments encoding the V region of an Ig chain and a smaller number of gene segments encoding the C region. During B cell development a complete coding sequence for each of the two Ig chains to be synthesized is assembled by site-specific genetic recombination, bringing together the entire coding sequences for a V region and the coding sequence for a C region. In addition, the V region of a light chain is encoded by a DNA sequence assembled from two gene segments—a V gene segment and short joining or J gene segment. The V region of a heavy chain is encoded by a DNA sequence assembled from three gene segments—a V gene segment, a J gene segment and a diversity or D segment.
The large number of inherited V, J and D gene segments available for encoding Ig chains makes a substantial contribution on its own to antibody diversity, but the combinatorial joining of these segments greatly increases this contribution. Further, imprecise joining of gene segments and somatic mutations introduced during the V-D-J segment joining at the pre-B cell stage greatly increases the diversity of the V regions.
After immunization against an antigen, a mammal goes through a process known as affinity maturation to produce antibodies with higher affinity toward the antigen. Such antigen-driven somatic hypermutation fine-tunes antibody responses to a given antigen, presumably due to the accumulation of point mutations specifically in both heavy-and light-chain V region coding sequences and a selected expansion of high-affinity antibody-bearing B cell clones.
Structurally, various functions of an antibody are confined to discrete protein domains (regions). The sites that recognize and bind antigen consist of three hyper-variable or complementarity-determining regions (CDRs) that lie within the variable (VH and VL) regions at the N-terminal ends of the two H and two L chains. The constant domains are not involved directly in binding the antibody to an antigen, but are involved in various effector functions, such as participation of the antibody in antibody-dependent cellular cytotoxicity.
The domains of natural light and heavy chains have the same general structures, and each domain comprises four framework regions, whose sequences are somewhat conserved, connected by three CDRs. The four framework regions largely adopt a β-sheet conformation and the CDRs form loops connecting, and in some cases forming part of, the β-sheet structure. The CDRs in each chain are held in close proximity by the framework regions and, with the CDRs from the other chain, contribute to the formation of the antigen binding site.
Generally all antibodies adopt a characteristic “immunoglobulin fold”. Specifically, both the variable and constant domains of an antigen binding fragment (Fab, consisting of VL and CL of the light chain and VH and CH1 of the heavy chain) consist of two twisted antiparallel β-sheets which form a β-sandwich structure. The constant regions have three- and four-stranded β-sheets arranged in a Greek key-like motif, while variable regions have a further two short β strands producing a five-stranded β-sheet.
The VL and VH domains interact via the five-stranded β sheets to form a nine-stranded β barrel of about 8.4 Å radius, with the strands at the domain interface inclined at approximately 50° to one another. The domain pairing brings the CDR loops into close proximity. The CDRs themselves form some 25% of the VL/VH domain interface.
The six CDRs, (CDR-L1, -L2 and -L3 for the light chain, and CDR-H1, -H2 and -H3 for the heavy chain), are supported on the β barrel framework, forming the antigen binding site. While their sequences are hypervariable in comparison with the rest of the immunoglobulin structure, some of the loops show a relatively high degree of both sequence and structural conservation. In particular, CDR-L2 and CDR-H1 are highly conserved in conformation.
Chothia and co-workers have shown that five of the six CDR loops (all except CDR-H3) adopt a discrete, limited number of main-chain conformations (termed canonical structures of the CDRs) by analysis of conserved key residues. Chothia and Lesk (1987) J. Mol. Biol. 196:901–917; Chothia et al. (1989) Nature (London) 342:877; and Chothia et al. (1998) J. Mol. Biol. 278:457–479. Chothia and Lesk (1987) ibid. described in their report that “from an analysis of the immunoglobulins of known atomic structure we determine the limits of the β-sheet framework common to the known structure (see section 4 below)” (page 902, column 1, 3rd paragraph). In section 4 of Chothia and Lesk (1987) ibid. it is described that “the conservation of the frame work structure extends to the residues immediately adjacent to the hypervarible regions”; and “if the conserved frameworks of a pair of molecules are superimposed, the differences in the positions of these residues is in most cases less than 1 Å and in all but one case less than 1.8 Å (Table 5)” (page 904, column 1, 2nd paragraph). Table 5 in Chothia and Lesk (1987) ibid. demarcates the hypervariable regions (i.e., CDRs) and framework regions. The adopted structure depends on both the CDR length and the identity of certain key amino acid residues, both in the CDR and in the contacting framework, involved in its packing. The canonical conformations were determined by specific packing, hydrogen bonding interactions, and stereochemical constraints of only these key residues which serve as structural determinants.
Various methods have been developed for modeling the three dimensional structures of the antigen binding site of an antibody. Other than x-ray crystallography, nuclear magnetic resonance (NMR) spectroscopy has been used in combination with computer model building to study the atomic details of antibody-ligand interactions. Dwek et al. (1975) Eur. J. Biochem. 53:25–39. Dwek and coworkers used spin-labeled hapten to deduce the combining site of the MoPC 315 myeloma protein for dinitrophenyl. Similar analysis has also been done using anti-spin labeled monoclonal antibodies (Anglister et al. (1987) Biochem. 26: 6958–6064) and on the anti-2-phenyloxazolone Fv fragments (McManus and Riechmann (1991) Biochem. 30:5851–5857).
Computer-implemented analysis and modeling of antibody combining site (or antigen binding site) are based on homology analysis comparing the target antibody sequence with those of antibodies with known structures or structural motifs in existing data bases (e.g. the Brookhaven Protein Data Bank). By using such homology-based modeling methods approximate three-dimensional structure of the target antibody is constructed. Early antibody modeling was based on the conjecture that CDR loops with identical length and different sequence may adopt similar conformations. Kabat and Wu (1972) Proc. Natl. Acad. Sci. USA 69: 960–964. A typical segment match algorithm is as follows: given a loop sequence, the Protein Data Bank can be searched for short, homologous backbone fragments (e.g. tripeptides) which are then assembled and computationally refined into a new combining site model.
More recently, the canonical loop concept has been incorporated into the computer-implemented structural modeling of an antibody combining site. In its most general form, the canonical structure concept assumes that (1) sequence variation at other than canonical positions is irrelevant for loop conformation, (2) canonical loop conformations are essentially independent of loop-loop interactions, and (3) only a limited number of canonical motifs exist and these are well represented in the database of currently known antibody crystal structures. Based on this concept, Chothia predicted all six CDR loop conformations in the lysozyme-binding antibody D1.3 and five canonical loop conformations in four other antibodies. Chothia (1989), supra. It is also possible to improve the modeling of CDRs of antibody structures by combining the homology-based modeling with conformational search procedures. Martin, A. C. R. (1989) PNAS 86, 9268–72.
Besides modeling a specific antibody structure, efforts have been made in generating artificial (or synthetic) libraries of antibodies which are screened against a specific target antigen. A fully synthetic combinatorial antibody library has been designed based on modular consensus frameworks and CDRs randomized with trinucleotides. Knappik et al. (2000) J. Mol. Biol. 296:57–86. In this study, the human antibody repertoire was analyzed in terms of structure, amino acid sequence diversity and germline usage. Modular consensus framework sequences with seven VH and seven VL were derived to cover 95% of variable germline families and optimized for expression in E. coli. After cloning the genes in all 49 combinations into a phagemid vector, a set of antibody phage display libraries were created, totaling 2×109 members in the libraries.
Phage display technology has been used extensively to generate large libraries of antibody fragments by exploiting the capability of bacteriophage to express and display biologically functional protein molecule on its surface. Combinatorial libraries of antibodies have been generated in bacteriophage lambda expression systems which may be screened as bacteriophage plaques or as colonies of lysogens (Huse et al. (1989) Science 246: 1275; Caton and Koprowski (1990) Proc. Natl. Acad. Sci. (U.S.A.) 87: 6450; Mullinax et al (1990) Proc. Natl. Acad. Sci. (U.S.A.) 87: 8095; Persson et al. (1991) Proc. Natl. Acad. Sci. (U.S.A.) 88: 2432). Various embodiments of bacteriophage antibody display libraries and lambda phage expression libraries have been described (Kang et al. (1991) Proc. Natl. Acad. Sci. (U.S.A.) 88: 4363; Clackson et al. (1991) Nature 352: 624; McCafferty et al. (1990) Nature 348: 552; Burton et al. (1991) Proc. Natl. Acad. Sci. (U.S.A.) 88: 10134; Hoogenboom et al. (1991) Nucleic Acids Res. 19: 4133; Chang et al. (1991) J. Immunol. 147: 3610; Breitling et al. (1991) Gene 104: 147; Marks et al. (1991) J. Mol. Biol. 222: 581; Barbas et al. (1992) Proc. Natl. Acad. Sci. (U.S.A.) 89: 4457; Hawkins and Winter (1992) J. Immunol. 22: 867; Marks et al. (1992) Biotechnology 10: 779; Marks et al. (1992) J. Biol. Chem. 267: 16007; Lowman et al (1991) Biochemistry 30: 10832; Lerner et al. (1992) Science 258: 1313). Also see review by Rader, C. and Barbas, C. F. (1997) “Phage display of combinatorial antibody libraries” Curr. Opin. Biotechnol. 8:503–508.
Generally, a phage library is created by inserting a library of random oligonucleotides or a cDNA library encoding antibody fragment such as VL and VH into gene 3 of M13 or fd phage. Each inserted gene is expressed at the N-terminal of the gene 3 product, a minor coat protein of the phage. As a result, peptide libraries that contain diverse peptides can be constructed. The phage library is then affinity screened against immobilized target molecule of interest, such as an antigen, and specifically bound phage particles are recovered and amplified by infection into Escherichia coli host cells. Typically, the target molecule of interest such as a receptor (e.g., polypeptide, carbohydrate, glycoprotein, nucleic acid) is immobilized by a covalent linkage to a chromatography resin to enrich for reactive phage particles by affinity chromatography and/or labeled for screening plaques or colony lifts. This procedure is called biopanning. Finally, high affinity phage clones can be amplified and sequenced for deduction of the specific peptide sequences.
A method for humanizing antibody by using computer modeling has also been developed by Queen et al. U.S. Pat. No. 5,693,762. The structure of a non-human, donor antibody (e.g., a mouse monoclonal antibody) is predicted based on computer modeling and key amino acids in the framework are predicted to be necessary to retain the shape, and thus the binding specificity of the CDRs. These few key murine donor amino acids are selected based on their positions and characters within a few defined categories and substituted into a human acceptor antibody framework along with the donor CDRs. For example, category 1: The amino acid position is in a CDR as defined by Kabat et al. Kabat and Wu (1972) Proc. Natl. Acad. Sci. USA 69: 960–964. Category 2: If an amino acid in the framework of the human acceptor immunoglobulin is unusual, and if the donor amino acid at that position is typical for human sequences, then the donor amino acid rather than the acceptor many be selected. Category 3: In the position immediately adjacent to one or more of the 3 CDR's in the primary sequence of the humanized immunoglobulin chain, the donor amino acid(s) rather than the acceptor amino acid may be selected. Based on these criteria, a series of elaborate selections of individual amino acids from the donor antibody is conducted. The resulting humanized antibody usually includes about 90% human sequence. The humanized antibody designed by computer modeling is tested for antigen binding. Experimental results such as binding affinity are fed back to the computer modeling process to fine-tune the structure of the humanized antibody. The redesigned antibody can then be tested for improved biological functions. Such a reiterate fine tuning process can be labor intensive and unpredictable.