The animal body has the ability of producing antibodies specifically recognizing and binding to various structures (epitopes) on the surfaces of various foreign agents invading into body fluids. The size of antibody repertoire (the total number of antibody types having distinct amino acid sequences binding to different types of antigens) of an animal individual has been estimated to be approximately 1 to 100 millions. The enormously large antibody repertoire is owing to DNA rearrangements of heavy chain VH-D-JH and light chain VL-JL on the antibody locus during differentiation of bone marrow stem cells into antibody-producing B lymphocytes. This event of DNA rearrangement occurs independently in each B cell. Thus, a single B cell that has a pair of VH-D-JH and VL-JL genes produces only a single type of antibody. However, a collection of entire B cells in an individual can produce various types of antibodies.
The techniques of antiserum preparation and monoclonal antibody preparation by using cell fusion, both of which have been utilized in the prior art, are based on the antibody-producing mechanism of animal body. Specifically, an antigen substance is injected in combination with an adjuvant into an animal (rabbit, goat, mouse, etc.) several times at a certain intervals of time. When the animal immune system recognizes the substance as a foreign one, a B cell that expresses an antibody binding to the antigen substance is stimulated for growth and differentiation, and thus a large quantity of the antibody is initiated to be secreted into the body fluid. Since there may be various structures on the surface of an antigen substance, even if it is a purified antigen, actually, the secreted antibody binding to the antigen is not a single type but a mixture of various types of antibodies. A serum containing such an antibody mixture (antiserum) is called “polyclonal antibody”. Polyclonal antibodies have been used as useful reagents for research. However, polyclonal antibodies which are reactive to their target antigen substances are often cross-reactive to some molecules having partly similar structures to the target antigen molecules. Such cross reactivity has been problematic, when a polyclonal antibody is used as a reagent for detecting an antigen.
The cell fusion technology was established and changed the situation completely. There are many B lymphocytes producing antibodies binding to an antigen substance in the spleen in an animal immunized with the antigen. However, it is difficult to culture and keep such cells alive permanently in vitro. Then, an idea was conceived that antibody-producing cells, which proliferate permanently, could be established by preparing hybrid cells obtained by fusing cells of a tumor cell line and antibody-producing cells; such a method was established eventually. Since a fusion cell line (hybridoma) thus established is derived from a single antibody-producing cell and a single tumor cell, the antibody produced by the cell consist of a single type antibody; thus, the antibody is called “monoclonal antibody”. This technology was established by Köhler and Milstein in 1975. A monoclonal antibody is a collection of homologous antibody molecules. Thus, monoclonal antibodies have been used as highly specific antibodies with less cross-reactivity. However, it has been pointed out that this method has the following problems that:
(1) it is required to prepare a large amount of purified sample of antigen substance;
(2) the substance must be antigenic in an animal to be immunized with it; and
(3) a great expenditure of time and effort is required to establish a monoclonal antibody.
An enormous number of useful antibodies have been provided by using the technologies for producing polyclonal and monoclonal antibodies, giving proof of the usefulness of the technologies. However, it is also true that there remain many difficult problems to be solved with respect to these methods. For example, these methods cannot meet the demand of preparation of antibodies against various antigens in a short time or of selectively preparing antibodies binding specifically to epitopes having special structures. It has been awaited to establish a method for preparing an antibody library comprising various antibody molecules, which ensures to isolate desired antibodies in a short time. Theoretically, the number of antibody types in such an antibody library must be comparable to the size of antibody repertoire in the animal body. Actually, however, it is impossible to prepare such an enormous library from animal cells. Monoclonal antibody preparation is nothing but screening of a library of antibody-producing cells derived from the animal body to obtain antibodies having desired reactivity. However, the repertoire in the library is greatly reduced during cell fusion or other processes.
Then, a method has been proposed, which comprises an E. coli expression system for antibody genes. Better et al., and Skerra and Plukthun have succeeded for the first time in expressing antibodies having antigen-binding activity in E. coli (Better, M., Chang, C. P., Robinson, R. R., Horwitz, A. H., Science 1988, 240:4855 1041-3; Skerra, A., Plukthun, A., Science 1988, 240:4855 1038-41). They attached a sequence serving as a secretory signal in E. coli to the N-terminus of antibody; thus, Fab-type and Fv-type antibodies were successfully produced and secreted by E. coli. 
Further, immediately after being established in 1988, the PCR technology was utilized to amplify genes encoding antibody variable domains. Primers sequences to amplify all types of VHDJH and VLJL genes expressed in the animal body (particularly, human) were proposed (Orlandi, R. et al., Proc. Natl. Acad. Sci. USA. 1989, 86:10 3833-7; Sastry, L. et al., Proc. Natl. Acad. Sci. USA. 1989, 86:15 5728-32). Then, vectors for producing antibodies in E. coli were constructed by using antibody genes amplified with theses primers (Huse, W. D. et al., Science 1989, 246:4935 1275-81; Ward, G. E. et al., J. Clin. Microbiol. 1989, 27:12 2717-23). At this stage, the repertoire size of antibody library was greatly increased. However, it was difficult to screen trace amounts of antibodies produced in E. coli using their antigen-binding activities as indices. Efficient screening awaited the application of the phage-display method to antibody library preparation.
The phage-display method was devised by Smith in 1985 (Smith, G. P., Science 1985, 228:4075 1315-7); the method comprises using filamentous bacteriophage, such as M13 phage, containing single-stranded circular DNA. The phage particle comprises cp8 protein that is a major protein of the phage particle, enveloping its DNA, and five molecules of cp3 protein functioning at the time of phage infection to E. coli. In the phage-display system, a fusion gene is constructed to encode a polypeptide linked to the protein cp3 or cp8, and the fusion protein is expressed on the surface of phage particle. Such a phage particle carrying the protein with binding activity on the surface can be enriched based on the binding activity to its ligand. This method for enriching DNA of interest is called “panning”. Enriched phages contain DNA encoding the protein having desired binding activity in their particles. The use of such filamentous phages as described above allowed the establishment of a system where screening based on the binding activity and DNA cloning can be carried out with high efficiency (Published Japanese Translation of International Publication No. Hei 5-508076). A method using a filamentous phage library has been reported, where antibodies can be expressed as Fab-type molecules (Published Japanese Translation of International Publication No. Hei 6-506836). In this report, the method comprises fusing the variable region with cp3 or the like whose N-terminal portion has been deleted.
The phage-display system was used for producing antibodies; antibody consisting of the VH domain alone, or scFv-, Fv-, or Fab-type antibody was expressed as a fusion with cp3 or cp8. The phage antibody binding to an antigen also comprises the antibody-encoding gene. However, antibodies, which were isolated from the antibody library at the very beginning using the phage-display system, often had only lower antigen-binding affinity. A method comprising artificially introducing mutations into genes was proposed to enhance the binding activity. Winter et al. provided an antibody library, from which high-affinity antibodies can be obtained, which contained antibodies having the semi-artificial sequences which were prepared by inserting random sequences between all pairs of VH or VL gene and JH or JL gene isolated (Nissim, A., Winter, G. et al., EMBO J. 1994, 13:3 692-8). De Kruif et al. also prepared an antibody library based on essentially the same principle (de Kruif, J., Boel, E., Logtenberg, T., J. Mol. Biol. 1995, 248:1 97-105). Vaughan et al. produced a sufficiently large antibody repertoire by expanding the library size (Vaughan, T. J. et al., Nat. Biotechnol. 1996, 14:3 309-14). Such strategies were indeed successful with respect to some limited types of antigens. However, even with such strategies, the probability of isolating desired antibodies still remains unsatisfactorily low. For example, even with currently available techniques, it is impossible to construct a human antibody library from which desired antibodies can be isolated with a probability comparable to that in the isolation of desired monoclonal antibodies using mice. Thus, a library consisting of more variations of antibodies is demanded.
An in vitro system faithfully mimicking the human's antibody-producing process is ideal to isolate antibodies binding specifically to various antigens from an in vitro constructed library perfectly containing all, types of human antibodies. The antigen-binding moiety of an antibody is located within the complementarity determining regions (hereinafter abbreviated as “CDR”), I, II, and III (six regions in total) of the variable (V) domains at the N-terminal ends of both chains H and L. The total number of amino acid sequence variations of the CDRs (including length variations), can be assumed to reflect the antibody repertoire size.
With respect to the antibody repertoire, it is necessary to consider both “naïve repertoire” before antigen invasion into the body and “antibody maturation” after antigen invasion. The active antibody gene encoding an antibody is created via DNA rearrangement. There are two classes of light chains: λ chain and κ chain; the gene encoding its V domain consists of VL gene and JL gene. There are 36 types of Vλ genes and 7 types of Jλ genes for the λ chain. During differentiation into B cell, the VL-JL gene is created via DNA cleavage and rejoining in the vicinity of VL gene and JL gene of κ chain or λ chain for the light chain. In most cases (two third), the segment of amino acids at 1st to 96th is derived from the VL gene and another segment of amino acids at 97th to 110th from the JL gene. However, after DNA cleavage, an exonuclease digests short portions of the DNA ends to be ligated, and then V (D) J DNA recombinase (recombinases) rejoins the DNAs. This may result in differences of approximately ±3 amino acids in the size of the VL domain encoded by the VL-JL gene. In the light chain, CDR1 corresponds to amino acids at 24th to 34th; CDR2, amino acids at 50th to 56th; CDR3, amino acids at 89th to 97th. Thus, for the λ chain, the total number of variations due to their combinations can be calculated by; (the number of Vλ genes)×(the number of Jλ genes)×(the total number of gaps). However, the actual size of Vλ-Jλ gene repertoire is smaller than 200 at the highest estimate. Because the Jλ genes carry similar sequences to one another, and 67% of gaps at the junctions are constant and the remaining 27% or more fall within ±1. The situation with regard to the κ gene is similar to that with λ gene. The total number of V κ genes is 37; the total number of Jκ genes is 4. Thus, the size of Vκ-Jκ gene repertoire is also smaller than 200.
The diversity in the light chain variable region is relatively low, but the diversity in the heavy chain variable region considerably larger. CDR1 (amino acids at 31st to 35th) and CDR2 (amino acids at 50th to 65th) are encoded by any one of 36 types of VH genes, and consequently the variety in this region is not so large. However, CDR3 produces enormous variations. CDR3 is positioned between CDR1 and CDR 2 of the two chains H and L in the antigen-binding moiety of an antibody. Heavy chain CDR1, CDR 2, and CDR 3 comprise about 60% and the light chain comprises about 40% of the whole surface area of the antigen-binding moiety. With respect to the portion excluding heavy chain CDR3, the repertoire size is estimated by: the number of light chain variations (several hundreds at the maximal estimate)×the number of heavy chain CDR1 and 2 variations (36)=approx. 10,000. Heavy chain CDR3 is encoded by a separate gene, which is called “D gene”; there are 26 types of D gene variations. Two types of DNA rearranging events, namely D-JH recombination and VH-D recombination, produce VH-D-JH and thus the CDR3-encoding region is completed.
It should be noted that the DNA rearrangement comprises the following processes:
(1) DNA cleavage at positions immediately adjacent to the signal sequences in the vicinity of VH, D, and JH genes;
(2) Digestion of DNA at its terminal portions by exonuclease;
(3) Insertion of a random sequence (referred to as N) by terminal transferase; and
(4) DNA repair and ligation.
In the above-mentioned process (2), larger variations are produced for the heavy chain than for the light chain. Further, the presence of process (3) is a more notable difference; light chain rearrangement has no such process. Heavy chain CDR3 (corresponding to amino acids at 95th to 102nd) is a region located between cysteine at residue 92nd and tryptophan at residue 103rd. The length of the region is altered ranging from 5 to 20 amino acids or more, and the sequence is also highly diverse in the region. These specific features produce an enormous number of variations of CDR3, the sequence of which, in effect, is different in every B cell differentiated independently.
The DNA rearrangements of heavy chain VH-D-JH and light chain VL-JL in the antibody locus during B cell differentiation is independent of the presence of antigen. An entire population of antibodies produced by total B cells, each of which expresses a pair of VH-D-JH and VL-JL genes, is referred to as “naïve repertoire of antibody”. After antigen invasion, cells expressing antibodies capable of binding to antigens are stimulated for growth and differentiation. While secreting antibodies, B cells are subjected to mutations frequently in the variable region gene (VH-D-JH, VL-JL) encoding the antibody. B cells producing antibodies whose binding affinity for antigens is increased by the introduced mutations survive as memory cells while secreting the antibodies of high performance. This process is referred to as “antibody maturation”. The mutational event plays the most important role in this process. Any antigen specificity that is not originally present in the naïve antibody repertoire is never newly generated through the introduction of mutations. Accordingly, a mechanism, by which clones having antigen specificities that are absent in the naive repertoire should be eliminated, is required to construct an antibody library in vitro which mimics the in vivo process of antibody production.
Problems on preparing antibody libraries in vitro are listed below.
(1) After each antibody gene is expressed, for example, in E. coli on a large scale, the heavy chain and light chain variable domains of the gene products hold together to an antibody molecule though protein folding. Clones, whose gene products fail any of these processes and thus are incapable of forming the exact immunoglobulin conformation, are of no use.
(2) In vivo, a complex formed from a pair of heavy chain variable domain and light chain variable domain exhibits unique antigen specificity in each cell; in vitro, it is necessary to construct a library by combining separately prepared libraries for the populations of heavy chain variable domains and light chain variable domain. For example, when a B cell population contains 10,000 types of cells, theoretically, the entire variations can be covered by a library at least consisting of: (10,000 types of the heavy chain variable domains)×(10,000 types of the light chain variable domains)=100 millions in total. As the number of combinations is increased, the library size indeed becomes larger, but the percentage of clones of inactive antibodies is also increased in the library.
(3) When human blood is used as an antibody gene source, the expression profile of antibody genes of each person may have a significant bias depending on his/her immunological history.
All of the above-listed three problems are involved in the causes of biased library repertoire. Namely, these result in unfavorable gaps between the theoretical and actual numbers of cloned in a library: the number of types of functional phage antibodies in a library prepared actually is markedly reduced as compared with the number of theoretically estimated clones in such a library.
More specifically, a library may comprise clones whose distribution is highly biased when the immune response against a specific antigen is enhanced. Alternatively, an antibody library prepared may contain many clones encoding antibodies having only insufficient antigen-binding activity. For example, when 50% each of light chain variable region genes and heavy chain variable region genes encode active antibody molecules, the probability that a combination of the two domains produces an active antibody is only 25%.
Such libraries have many problems in addition to one that the actual repertoire size is far smaller than the theoretical size. For example, antibody molecules having only weak antigen-binding activity interfere with immunological reaction on screening. Specifically, antibody-antigen complex formation is an equilibrium reaction; when clones of minority population coexist with those of majority population, the majority may overwhelm the minority in spite of the difference in the antigen-binding affinity.
In addition, the presence of clones encoding inactive antibodies can be an obstacle in cloning. Namely, as the number of clones encoding inactive antibodies is larger, the probability that the population include clones proliferating very rapidly becomes greater. Such clones growing rapidly are preferentially selected during screening, and thus may cause a considerably high background.
A problem of previously reported libraries is that it is hard to estimate how many effective clones are actually present in the libraries. It is thus impossible to evaluate the efficiencies of library and screening.