Biotech evolutionary methods, including combinatorial libraries and phage-display technology (PARMLEY and SMITH 1988; SCOTT and SMITH 1990; SMITH 1993), are used in the search for novel ligands of diagnostic, biomedical and pharmaceutical use (reviews; CORTESE 1996; COLLINS 1997). These methods, which use empirical procedures to select molecules with required characteristics, e.g. binding properties, from large populations of variant gene products has been compared to the process of natural evolution. Evolution includes the generation of mutation, selection of functionality over a time period and the ability of the systems to self-replicate. In particular natural systems use recombination to reassort mutations accumulated in the selected population to exponentially increase the combinations of mutations and thus increase the number of variants in the population. This latter aspect, namely the introduction of recombination within mutant genes has only recently been applied to biotech evolutionary methods, although it has been used to increase the size of initial phage-display libraries (e.g. WATERHOUSE 1993; TSURUSHITA 1996; SODOYER 1994; FISCH 1996). STEMMER 1994a, 1994b and 1995 teach that recombination amongst a population of DNA molecules can be achieved in vitro by PCR amplification of a mixture of small overlapping fragments with (1994a, 1994b) or without (STEMMER 1995) primer oligonucleotide sequences being used to drive the PCR reaction. The method is not applicable to recombination within a fully randomized (highly mutated) sequence since the method relies on high homology of the overlapping sequences at the site of recombination. STEMMER 1994b and CRAMERI 1996a do, however, demonstrate the usefulness of in vitro recombination for molecular evolution, where CRAMERI 1996b also demonstrate the use of the method in conjunction with phage-display, even though their method is confined to regions of low mutant density (ca. 0.5-1% of the bases are mutated in their method) as they state xe2x80x9cthe advantages of recombination over existing mutagenesis methods are likely to increase with the numbers of cycles of molecular evolutionxe2x80x9d (STEMMER 1994b). We point out that this is due to the self-evident fact that the number of variants created by mutagenesis introducing base changes in existing mutant structures is an additive i.e., a linearly increasing function, whereas the use of recombination between mutated variants yields novel variants as an exponential function of the initial number of variants. The classical phage-display libraries are thus at a grave disadvantage for the generation of novel variants; e.g. to encompass all the possible variants of an octapeptide sequence 208=2.56xc3x971010 different variants would be required.
MARKS 1992 state the importance of recombination in the generation of higher specificity in combinatorial libraries e.g. in attaining antibodies of higher specificity and binding constants in the form of reshuffling light and heavy chains of immunoglobulins displayed in phage-display libraries. These authors do not instruct how the shuffling of all the light and heavy chains in a population heterogeneous in. both chains can be achieved, e.g. by a vector allowing recombination. Heavy and light chains were selected one after the other, i.e. an optimal heavy chain first selected from a heterogeneous heavy chain population in the presence of a constant light chain, then by preparing a new library, an optimal light chain in combination with the preselected optimal heavy chain. The extensive time consuming sequential optimization strategies currently utilized including consensus-mutational libraries, in vivo mutagenesis, error-pone PCR as well as chain shuffling are summarized in FIGS. 5 and 6 of COLLINS 1997.
Gene libraries are generated containing extremely large number (106 to 1010) of variants. The variant gene segments are fused to a coat protein gene of a filamentous bacteriophage (e.g. M13, fd or fl), and the fusion gene is inserted into the genome of the phage or of a phagemid. A phagemid is defined as a plasmid containing the packaging and replication origin of the filamentous bacteriophage. This latter property allows the packaging of the phagemid genome into a phage coat when it is present in an Escherichia coli host strain infected with a filamentous phage (superinfection). The packaged particles produced, be they phage or phagemid, display the fusion protein on the surface of the particles secreted into the medium. Such packaged particles are able to inject their genomes into a new host bacterium, where they can be propagated as phage or plasmids, respectively. The special property of the system lies in the fact that since the packaging takes place in individual cells usually infected by a single variant phage/phagemid, the particles produced on propogation contain the gene encoding the particular variant displayed on the particle""s surface. Several cycles of affinity selection for clones exhibiting the required properties due to the particular property of the variant protein displayed, e.g. binding to a particular target molecule immobilized on a surface, followed by amplification of the enriched clones leads to the isolation of a small number of different clones having these properties. The primary structure of these variants can then be rapidly elucidated by sequencing the hypermutated segment of the variant gene.
There are a number of factors which limit the potential of this technology. The first is the number and diversity of the variants which can be generated in the primary library. Most libraries have been generated by transformation of ligated DNA preparations into Escherichia coli by electroporation. This gives an efficiency of ca. 0.1 to 1xc3x97106 recombinants/microgram ligated phage DNA. The highest cloning efficiency reported (of 107 recombinants per microgram insert DNA) is obtained using special lambda vectors into which a single filamentous phage vector is inserted, in a special cloning site, bracketted by a duplication of the filamentous phage replication/packaging origin (AMBERG 1993; HOGREFE 1993a+b). The DNA construct is efficiently introduced into the Escherichia coli host after packaging into a lambda bacteriophage coat in an in vitro lambda packaging mix. Infection of a strain carrying such a hybrid phagemid by an M13-helper phage allows excision and secretion of the insert packed in a filamentous phage coat. Neither AMBERG 1993 nor HOGREFE 1993a+b instruct on how the method may be used to introduce recombination during this procedure. Although they mention that the efficiency may be improved by the use of type IIS restriction endonucleases during the construction of the concatemers used as substrate for the in vitro packaging no examples are given and in the ensuing five years no examples have appeared in the literature. The procedure described in our invention also uses the high efficiency of the in vitro lambda packaging, but maximizes the capacity of the cloning vector by using a cosmid vector (8) in which many copies (say 8) of the phagemid are inserted in each construct. One of the surprising innovative aspects of this procedure is the discovery of a number of protocols for the de novo synthesis of large hypervariable libraries. One type is particularly efficient, in that phagemid/cosmid vectors are forced to integrate into the hybrid concatamers oriented in the same orientation. Any variant of the protocol which does not ensure this feature does not work efficiently.
SZYBALSKI 1991 teaches a large number of novel applications for type IIS restriction endonucleases, including precise trimming of DNA, retrieval of cloned DNA, gene assembly, use as a universal restriction enzyme, cleavage of single-stranded DNA, detection of point mutations, tandem amplification, printing amplification reactions and localization of methylated bases. They do not give any instruction as to how such enzymes can be used in the creation of recombination within highly mutated regions, e.g. within a combinatorial library.
Reference List
Amberg, J, Hogrefe, H., Lovejoy, H., Hay, B., Shopes, B, Mullinax, R. and Sorge, J. A. (1993), Strategies, 5, 2-3.
Collins, J. (1997) Phage display. In Moos, W. H. et al. (eds) Annual reports in combinatorial chemistry and molecular diversity. Vol. 1., ESCOM Science publ., Leiden. pp. 210-262.
Cortese, R. (ed.) (1996) Combinatorial libraries: Synthesis, Screening and Application potential. Walter de Gruyter, Berlin.
Crameri, A., Whitehom, E. A., Tate, E. and Stemmer, W. P. C. (1996a) 14, 315-319.
Crameri, A., Cwirla, S. and Stemmer, W. P. C. (1996b) Nat. Med. 2, pg. 100.
Fisch, I., Kontermann, R. E., Finnern, R., Hartley, O., Soler-Gonzalez, A. S., Griffiths, A. D. and Winter, G. (1996) Proc. Natn. Acad. Sci. USA. 93, 7761.
Marks, J. D.; Griffiths, A. D.; Malmqvist, M.; Clackson, T. P.; Bye, J. M. and Winter, G. (1992) BioTechnol. 10, 779-783.
Hogrefe, H. H., Amberg, J. R., Hay, B. N., Sorge, J. A. and Shopes, B. (1993) Gene, 137, 85-91.
Hogrefe, H. H., Mullinax, R. L., Lovejoy, A. E., Hay, B. N. and Sorge, J. A. (1993) Gene 128, 119-126.
Parmley, S. F. and Smith, G. P. (1988) Gene 73, 305-318.
Scott, J. K. and Smith, G. P. (1990) Science 249, 386-390.
Smith, G. P. (1993) Gene 128, 1-2.
Sodoyer, R., Aujume, L., Geoffrey, F., Pion, C., Puebez, I., Montegue, B., Jacquemot, P. and Dubayle, J. (1996) In Kay, B. K. et al. (eds.) Phage display of peptides and proteins. A laboratory manual. Academic Press, San Diego. Pp. 215-226.
Stemmer, W. P. C. (1994a) Nature (Lond.) 370, 389-391.
Stemmer, W. P. C. (1994b) Proc. Nat. Acad. Sci. USA, 91, 10747-10751.
Stemmer, W. P. C. (1995) Gene 164, 49-53.
Szybalski, W., Kim, S. C., Hasan, N. and Podhajska, A. J. (1991) Gene, 100, 13-26.
Tsurushita, M., Fu, H. and Warren, C. (1996) Gene, 172, 59.
Waterhouse, P., Griffiths, A. D., Johnson, K. S. and Winter, G. (1993a) Nucleic Acid Res. 2265-2269.
According to a first embodiment the invention concerns a bank of genes, wherein said genes comprise a double stranded DNA sequence which is represented by the following formula of one of their strands:
5xe2x80x2B1B2B3 . . . BnXn+1 . . . Xn+aZn+a+1Zn+a+2Xn+a+3 . . . Xn+a+bQn+a+b+1 . . . Qn+a+b+j3xe2x80x2
wherein n, a, b and j are integers and
n greater than 3, a greater than 1, b greater than 3 and j greater than 1,
wherein Xn+1 . . . Xn+a+b is a hypervariable sequence and B, X, Z and Q represent adenine (A), cytosine (C), guanine (G) or thymine (T),
(i) Z represents G or T at a G:T ratio of about 1:1, and/or
(ii) Z represents C or T at a C:T ratio of about 1:1, and/or
(iii) Z represents A or G at a A:G ratio of about 1:1, and/or
(iv) Z represents A or C at a A:C ratio of about 1:1, and wherein
subsequences B1 . . . Bn and/or Qn+a+b+1 . . . Qn+a+b+j represent recognition sites for restriction enzymes, and wherein the recognition sites are oriented such that their cleavage site upon cleavage generates a cohesive end including the two bases designated Z.
Restriction of this sequence with a type IIS restriction enzyme as thus described, followed by religation leads to the recombination of the hypervariable regions located 5xe2x80x2 and 3xe2x80x2 of the cleavage site. This is the essence of the methodology which we designate xe2x80x9ccosmix-plexingxe2x80x9d. It is essential in this procedure that the fragments generated on cleavage by the restriction enzyme are religated in the correct orientation (xe2x80x9chead-to-tailxe2x80x9d), whereby the Z sequences are chosen for the four libraries ((i) to (iv)) so as to ensure this (see below) yet still allowing all possible amino-acids to be encoded at the cleavage site. If this correct orientation is not ensured there will be a drastic reduction in both the percent of correctly reconstituted fusion-protein genes, a reduction in the proportion of molecules which can be packaged in vitro in the lambda-packaging extracts (which requires the correct orientation of the cos-sites), as well as a reduction in the proportion of in vivo excisable phagemid copies from the cosmid concatemer (excision requires the correct orientation of consecutive phage replication origins). 
To prevent the problems arising from false orientation (head-to-head) mentioned in the previous paragraph, the four gene libraries mentioned in claim must be kept separated during cosmix-plexing. In fact with respect to the formation of recombinants the libraries behave as 16 separate sets which cannot recombine with each other:four libraries maintained separately, where each set contains four possible cohesive ends, e.g. library (i) with Z=G or T contains: 
It is evident that problems of false orientation will arise on mixing the different libraries, e.g.
The AC library (iv) will contain AA, AC, CA and CC sequences which can pair in the false orientation with, respectively each of the cohesive ends generated in library (i).
A specific embodiment of the invention concerns a bank of genes wherein subsequences B1 . . . Bn or Qn+a+b+1 . . . Qn+a+b+j represent recognition sites for restriction enzymes and wherein the recognition sites are orientated such that their cleavage site upon cleavage generates a cohesive end including the two bases designated Z.
Further, a specific embodiment concerns a bank of genes, wherein the cohesive end is a 2 bp single strand end formed by the two bases designated Z.
Further, a specific embodiment concerns a bank of genes wherein each gene is provided as display vector, especially as M13 phage or M13-like phage or as phagemid.
Another embodiment of the invention concerns a set of four gene banks according to the invention wherein the gene banks are characterized as follows:
first gene bank: Z represents G or T, preferentially at a G:T ratio of about 1:1;
second gene bank: Z represents C or T, preferentially at a C:T ratio of about 1:1;
third gene bank: Z represents A or G, preferentially at a A:G ratio of about 1:1; and
fourth gene bank: Z represents A or C, preferentially at a A:C ratio of about 1:1.
A specific embodiment of the invention concerns a set of four gene banks wherein each gene is provided as display vector, especially as M13 phage or M13-like phage or as phagemid.
Another embodiment of the invention concerns a bank of genes wherein said genes comprise a double stranded DNA sequence which is represented by the following formula of one of their strands:
5xe2x80x2B1B2B3 . . . BnXn+1 . . . Xn+aZn+a+1Zn+a+2Xn+a+3 . . . Xn+a+bQn+a+b+1 . . . Qn+a+b+j3xe2x80x2
wherein n, a, b and j are integers and
n greater than 3, a greater than 1, b greater than 3 and i greater than 1,
wherein Xn+1 . . . Xn+a+b is a hypervariable sequence and B, X, Z and Q represent adenine (A), cytosine (C), guanine (G) or thymine (T), and wherein
four sets of oligonucleotide sequences comprising Zn+a+1 and Zn+a+2 are present, preferentially at a ratio of (i):(ii):(iv) of about 1:1:2:2, wherein the four sets are characterized as follows:
first set: Zn+a+1 represents G and Zn+a+2 also represents G;
second set: Zn+a+1 represents C and Zn+a+2 represents T;
third set: Zn+a+1 represents A and Zn+a+2 represents A or C, preferentially at A:C ratio of about 1:1; and
fourth set: Zn+a+1 represents T and Zn+a+2 represents C or G. preferentially at a C:G ratio of about 1:1, and wherein sequences B1 . . . Bn and/or Qn+a+b+1 Qn+a+b+j represent recognition sites for restriction enzymes, wherein the recognition sites are orientated such that their cleavage site upon cleavage generates a cohesive end including the two bases designated Z.
A specific embodiment of the invention concerns a bank of genes wherein the four sets of oligonucleotide sequences are present at a ratio of (i):(ii):(iii):(iv) of (0 to 1):(0 to 1):(0 to 1):(0 to 1) with the proviso that at least one of said sets is present.
Further, a specific embodiment of the invention concerns a bank of genes wherein subsequences B1 . . . Bn and/or Qn+a+b+1 . . . Qn+a+b+j represent recognition sites for restriction enzymes and wherein the recognition sites are orientated such that their cleavage site upon cleavage generates a cohesive end including the two bases designated Z.
Further, a specific embodiment of the invention concerns a bank of genes wherein the cohesive end is a 2 bp single strand end formed by the two bases designated Z.
Another embodiment of the invention concerns bank of genes wherein said genes comprise a double stranded DNA sequence which is represented by the following formula of one of their strands:
5xe2x80x2B1B2B3 . . . BnXn+1 . . . Xn+aZn+a+1Zn+a+2Xn+a+3 . . . Xn+a+bQn+a+b+1 . . . Qn+a+b+j3xe2x80x2
wherein n, a, b and j are integers and
n greater than 3, a greater than 1, b greater than 3 and j greater than 1,
wherein Xn+1 . . . Xn+a+b is a hypervariable sequence and B, X, Z and Q represent adenine (A), cytosine (C), guanine (G) or thymine (T), and wherein
the following six sets of oligonucleotide sequences comprising Xn+a, Zn+a+1 and Zn+a+2 are present, preferably at a ratio of (i):(ii):(iii):(iv):(v):(vi) of about 3:4:3:4:4:1, wherein the six sets are characterized as follows:
first set: Xn+a represents A, G and/or T, preferentially at a ratio of about 1:1:1 or Xn+a represents C, G and/or T, preferentially at a ratio of about 1:1:1, Zn+a+1 represents G and Zn+a+2 represents G;
second set: Xn+a represents A, C, G and/or T, preferentially at a ratio of about 1:1:1:1, Zn+a+1 represents C and Zn+a+2 represents T;
third set: Xn+a represents A, C and/or G, preferentially at a ratio of about 1:1:1, Zn+a+1 represents A and Zn+a+2 represents A;
fourth set: Xn+a represents A, C, G and/or T, preferentially at a ratio of about 1:1:1:1, Zn+a+1 represents A and Zn+a+2 represents C;
fifth set: Xn+a represents A, C, G and/or T, preferentially at a ratio of about 1:1:1:1, Zn+a+1 represents T and Zn+a+2 represents C;
sixth set: Xn+a represents A, Zn+a+1 represents T and Zn+a+2 represents G.
A method should be developed which allows cosmix-plexing without maintaining separate libraries. This would have the advantage of reducing manipulation, involved in screening the four separate libraries, as previously described. This would offer a saving in both time and materials. This has been achieved in two separate versions of the invention.
It is possible to select combinations of nucleotides within the cohesive ends generated by type IIS restriction within the aforementioned sequence, i.e. ZZ, in which all the clones are present in a single library and in which the possibility of false orientation during ligation, and the associated loss of efficiency associated with this, is eliminated. At the same time the number of subsets, defined by the number of different cohesive ends which can be generated, which cannot interact (recombine) with each other, is reduced from the 16 sets, as in the previously described version of the method, to 6.
The combinations of 2 bp single-strand cohesive end sequences which can be generated at ZZ are theoretically as follows:
Of these, the sequences with an inverted symmetry axis (palindromes: AT, TA, GC, CG), can pair in both orientations and are thus to be eliminated from cosmix-plexing libraries for the reasons given above. The remaining 12 sequences are actually 6 sets of complementary pairs (e.g. CC+GG, AA+TT, CA+TG). By choosing one partner from each pair (total of 6) a single set of cohesive ends can be generated which can pair only in the correct xe2x80x9chead-to-tailxe2x80x9d orientation. The actual choice of sequences takes the codon usage into account, assuming that ZZ are chosen as the 2nd and 3rd position of the codon. Determining are the amino-acids which are encoded by either a single or only two codons (single codon methionine (TG) and tryptophan (GG); after elimination of the palindromic sequences there also only single codons available encoding aspartic acid (Asp), asparagine (Asn), cystine (Cys), histidine (His) and tyrosine (Tyr). To encode Asp, Asn, His and Tyr an AC sequence is required. Selecting AC has the default that the complimentary sequence GT must be avoided. This is the only possibility of encoding Cys. However, the inclusion of Cys within the hypervariable sequence often causes problems of misfolding and the formation of dimeric aggregates, dependent on the redox potential of the environment. It was thus decided to create a set in which Cys codons are eliminated, but which will be of great use in many applications, including cyclic peptide library formation. If the sequence AA is chosen to encode glutamic acid (Glu), glutamine (Gln) and lysine (Lys) also allowing the stop-codon TAA, then TT must be eliminated. The consequence of this is that TC must also be included so that phenylalanine (Phe) and isoleucine (Ile) can be encoded. The elimination of the complimentary GA is without consequence since other GG codon(s) encode argenine (Arg) and glycine (Gly). The elimination of CC is then without consequence, since alanine (Ala), proline (Pro), serine (Ser) and threonine (Thr) can be encoded by CT-containing codons. This is the argumentation for the selection of ZZ sequences designated xe2x80x9ccombination Axe2x80x9d below.
For the sake of completeness:if the doublet AA were left out and, consequently TT included, then AG must be included to encode Glu, Gln and Lys. In order to encode Ala and Pro, either CT (combination B) or CA (combination C) must now be included. This leads to the inclusion of either AG and CT (combi. B), or CA and TG (combi. C) as complimentary pairs. Combinations B and C thus do not represent an adequate solution to the problem.
Sequences chosen are shown in bold type. Complementary pairs are adjacent to each other.
Gene libraries can be created according the requirements of the combination A, by creating four sets of nucleotides in which Xn+aZn+a+1Zn+a+2 are:
i) NGG
ii) NCT
iii) NA (A or C)
iv) NT (C or G),
where N is C, G, A or T.
After the synthesis of these oligonucleotides they can be combined to obtain a single-tube cosmix-plexing gene library, whereby to obtain the relative codon frequencies given in Table 2 the gene libraries i) to iv) are present in the final mixture at a ratio of 1:1:2:2, respectively. As explained above this mixture will always give a correct orientation on religation of type IIS restriction enzyme-cleaved fragments having the 2bp single-stranded cohesive ends ZZ.
Gene libraries can be created according a modification of combination A, in which both Stop and cystine codons are eliminated, and in which each of the other amino-acids is each represented by a single codon, by creating six sets of nucleotides in which Xn+aZn+a+1Zn+a+2 are:
i) (A, G or T) GG or (C, G or T) GG
ii) NCT
iii) (A, G or C) AA
iv) NAC
v) NTC
vi) ATG
After the synthesis of these oligonucleotides they can be combined to obtain a single-tube cosmix-plexing gene library, whereby to obtain the equimolar codon frequencies for each amino-acid the gene libraries i) to vi) are present in the final mixture at a ratio of 3:4:3:4:4:4:1 respectively. As explained above this mixture will always give a correct orientation on religation of type IIS restriction enzyme-cleaved fragments having the 2bp single-stranded cohesive ends ZZ.
Again, as with the previous sets this single-tube library represents six-subsets which are unable to recombine with each other during cosmix-plexing.
The amino-acid at the recombination site is determined by the 5xe2x80x2-hypervariable segment. The set of amino-acids which may be represented at this position is defined for each subset as presented in Table 2.
The minimal number of clones required in a library to include all possible amino-acid sequences in a random peptide containing xe2x80x98nxe2x80x99 amino-acids is 20n, i.e. for n=9, 209=5.12xc3x971011. In fact, at a confidence limit of say 95%, this figure must be some three-fold higher, to allow for the statistics of sampling, i.e. ca. 1.5 xc3x971012. In practice this figure may be higher due to, e.g. non-random synthesis of the oligonucleotides used to generate the library as well as biased codon representation (for a detailed discussion see Collins 1997).
The cosmix-plexing strategy is based on the concept that in initial selection experiments clone populations will be enriched for sequences which contain structural elements based on the primary sequence in the varied segment. Even if the optimal sequence is not present due to the limitations imposed by the limited size of the initial library, cosmix-plexing will increase the likelihood of finding just such a sequence by providing a large number of novel recombinants in which the 5xe2x80x2- and 3xe2x80x2-xe2x80x9chalvesxe2x80x9d of the varied section are reasserted e.g. for the hypervariable nonapeptide library described in the example, the sequences encoding the amino-proximal five amino acids are recombined with the sequences encoding the carboxy-proximal four amino-acids. Since the cohesive ends essentially limit the recombination to defined subsets, in which one subset cannot undergo recombination with any of the other subsets, the actual number of recombinants generated is less than could be obtained with completely random recombination.
For the initial four-tube protocol described, four separate libraries each containing four subsets are used:
Random recombination would generate, for a set of N clones, N2 recombinants, assuming N2 is less than or equal to the theoretical number of variants (20n, see above) which can be encoded within the hypervariable segment, otherwise it will tend to 20n.
For the four-tube protocol 16 subsets are created each representing a pool within which recombination can take place. If the total the library consists of N clones then the number of novel recombinants which can be formed within each of the 16 subsets is (N/16)2. Summing for all sixteen subsets, the number of recombinants which can be generated is 16xc3x97(N/16)2=N2/16, again assuming N2/16 is less than or equal to the theoretical number of variants (20n, see above) which can be encoded within the hypervariable segment, otherwise it will tend to 20n.
For the single-tube protocol only 6 subsets are created, each representing a pool within which recombination can take place. If the total library consists of N clones then the number of novel recombinants which can be formed within each of the 6 subsets is (N/6)2. Summing for all six subsets, the number of recombinants which can be generated is 6xc3x97(N/6)2=N2/6, again assuming N2/6 is less than or equal to the theoretical number of variants (20n, see above) which can be encoded within the hypervariable segment, otherwise it will tend to 20n.
It is thus clear that the single-tube version of the invention is superior not only in terms of time and economy of the procedure but in the potential to generate a greater diversity from a given number of clones during cosmix-plexing guided recombination.
A specific embodiment of the invention concerns a bank of genes, wherein the six sets of oligonucleotide sequences are present at a ratio of (i):(ii):(iii):(iv):(v):(vi) of (0 to 1):(0 to 1):(0 to 1):(0 to 1):(0 to 1):(0 to 1) with the proviso that at least one of said sets is present.
Further, a specific embodiment of the invention concerns a bank of genes wherein each gene is provided as display vector, especially as M13 phage or M13-like phage or as phagemid.
Further, a specific embodiment of the invention concerns a bank of genes wherein the double stranded DNA sequence is comprised by a DNA region (fusB) encoding a peptide or a protein to be displayed.
Further, a specific embodiment of the invention concerns a bank of genes, characterized in that n=j=6, a=14 and b=16.
Further, a specific embodiment of the invention concerns a bank of genes wherein the restriction enzyme is a type IIS restriction enzyme.
Further, a specific embodiment of the invention concerns a bank of genes which is characterized in that
(a) subsequence B1 . . . Bn is the recognition site for the restriction enzyme BpmI (CTGGAG) and subsequence Qn+a+b+1 . . . Qn+a+b+j is an inverted BsgI recognition site (CTGCAC); or
(b) subsequence B1 . . . Bn is the recognition site for the restriction enzyme BsgI (GTGCAG) and subsequence Qn+a+b+1 . . . Qn+a+b+j is an inverted BpmI recognition site (CTCCAG).
Further, a specific embodiment of the invention concerns a bank of genes which is characterized in that the hypervariable sequence Xn+1 . . . Xn+a+b contains NNB or NNK wherein N=adenine (A), cytosine (C), guanine (G) or thymine (T);
B=cytosine (C), guanine (G) or thymine (T); and
K=guanine (G) or thymine (T).
Another embodiment of the invention concerns a phagemid pROCOS4/7 of the sequence shown in FIG. 6.
Still another embodiment of the invention concerns a phagemid pROCOS5/3 of the sequence shown in FIG. 7.
Another embodiment of the invention concerns a method for the production of large
phage-display libraries or
phagemid-display libraries,
xe2x80x83containing or consisting of optionally packaged recombined display vectors, wherein recombination takes place at the cleavage site(s) for a restriction enzyme (cut (B) enzyme; arrow in FIG. 3) and wherein
(a) to (b) a double-stranded DNA prepared from Escherichia coli cells containing a display vector population, consisting of M13 phages or M13-like phages or consisting of phagemids according to the invention; a cosmid vector; a restriction enzyme for cut (B); and a restriction enzyme for cut (A) are selected, wherein
(i) the cut (B) enzyme cleaves the display vectors in the region encoding the displayed peptide or displayed protein (arrow in FIG. 3) and generates unique non-symmetrical cohesive ends, wherein each cohesive end is a 2 bp single strand end formed by the two bases designated Z, and
(ii) the cut (A) enzyme cleaves the display vectors and the cosmid vector and generates upon cleavage unique non-symmetrical cohesive ends (fusA) which differ from those resulting from cut (B),
(c) the display vectors are cleaved with the first restriction enzyme,
(d) the display vector and the cosmid vector are cleaved with the second restriction enzyme,
(e) the cleaved display vectors are ligated with the cleaved cosmid vectors forming concatamers,
(f) the ligation product is subjected to a lambda packaging and transduced into an Escherichia coli host,
(g) if wanted, selection is made for a gene present in the ligated display vectors,
(h) the transduced display vectors in the Escherichia coli host are
either in the case of a phage-display vector spontaneously packaged in M13 or M13-like phage coats
or in the case of a phagemid-display vector packaged by infecting the Escherichia coli host with an M13 type helper phage (superinfection),
(i) the packaged display vectors are passaged in a fresh Escherichia coli host and phage-display or phagemid-display libraries are formed and, if wanted,
(j) the passaged display vectors are
either in the case of a phage-display vector spontaneously packaged in M13 or M13-like phage coats
or in the case of a phagemid-display vector packaged by infecting the fresh Escherichia coli host with an M13 type helper phage (superinfection) and
xe2x80x83phage-display or phagemid-display libraries are formed.
A specific embodiment of the invention concerns a method which is characterized in that in steps (a) to (b) a type IIS restriction enzyme is selected, preferably BgII, DraIII, BsgI or BpmI.
Further, a specific embodiment of the invention concerns a method which is characterized in that for cuts (B) and (A) the same restriction and/or restriction enzyme is selected.
Further, a specific embodiment of the invention concerns a method which is characterized in that as cut (B) enzyme and as cut (A) enzyme different enzymes are used (FIG. 3), preferably BsgI or BpmI as cut (B) enzyme and DralII as cut (A) enzyme (fd or M13 replication origin cut).
Further, a specific embodiment of the invention concerns a method which is characterized in that in step (h) and facultatively in step (j) M13K07 is used as M13 type helper phage.
Further, a specific embodiment of the invention concerns a method which is characterized in that the phagemid and the cosmid are identical and, further, presence of and cleavage with cut (A) enzyme is optional and/or cut (B) enzyme and cut (A) enzyme are identical.
Further, a specific embodiment of the invention concerns a method which is characterized in that in step (i) the multiplicity of infection (MOI) is less than or equal to 1.
Further, a specific embodiment of the invention concerns a method wherein the cosmid comprises an fd or M13 bacteriophage origin (replication/packaging).
Further, a specific embodiment of the invention concerns a method wherein in step (e) a mol ratio of display vectors to the cosmid vector within the range of from 3:1 to 15:1 and preferably 3:1 to 10:1 is used.
Further, a specific embodiment of the invention concerns a method wherein in step (e) a vector concentration (comprising display vectors and cosmid vectors) of more than 100 xcexcg DNA/ml is used.
Another embodiment of the invention concerns a method for the production of large
phage-display extension libraries or
phagemid-display extension libraries, wherein
xe2x80x83an oligonucleotide cassette of d bases in length is inserted into a restriction site (cut (B)) via the cohesive ends ZZ as defined above to yield a sequence (supra sequence) or a gene comprising a double stranded DNA sequence which is represented by the following formula of one of their strands:
5xe2x80x2B1 . . . BnXn+1 . . . Xn+aZn+a+1Zn+a+2Xn+a+3 . . . Xn+a+dZn+a+d+1 Zn+a+d+2Xn+a+d+3 . . . Xn+a+d+bQn+a+d+b+1 . . . Qn+a+d+b+j3xe2x80x2
wherein d is an integer and a multiple of 3, preferably within the range of from 6 to 36; n, a, b and j and B, X, Z and Q have the same meaning as in any of the preceding claims; and wherein
(a) to (b) a double-stranded DNA prepared from Escherichia coli cells containing a display vector population, consisting of M13 phages or M13-like phages or consisting of phagemids according to the invention; a cosmid vector; a restriction enzyme for cut (B); and a restriction enzyme for cut (A) are selected, wherein
(i) the cut (B) enzyme cleaves the display vectors in the region encoding the displayed peptide or displayed protein and generates unique non-symmetrical cohesive ends; wherein each cohesive end is a 2 bp single strand end formed by the two bases designated Z,
(ii) the cut (A) enzyme cleaves the display vectors and the cosmid vector such that unique non-symmetrical cohesive ends are formed which differ from those resulting from cut (B),
xe2x80x83(c1) the display vectors are cut with the cut (B) restriction enzyme,
xe2x80x83(c2) a DNA cassette is inserted into the cleavage site with their ZZ cohesive ends,
(d) the resulting display vector and the cosmid vector are cleaved with the cut (A) restriction enzyme,
(e) the cleaved display vectors are ligated with the cleaved cosmid vectors forming concatamers,
(f) the ligation product is subjected to a lambda packaging and transduced into an Escherichia coli host such that the DNA cassette lies between two hypervariable sequences (extension sequences),
(g) if wanted, selection is made for a gene present in the ligated display vectors,
(h) the transduced display vectors in the Escherichia coli host are
either in the case of a phage-display vector spontaneously packaged in M13 or M13-like phage coats
or in the case of a phagemid-display vector packaged by infecting the scherichia coli host with an M13 type helper phage (superinfection),
(i) the packaged display vectors are passaged in a fresh Escherichia coli host and phage-display or phagemid-display libraries are formed, and, if wanted,
(j) the passaged display vectors are
either in the case of a phage-display vector spontaneously packaged in M13 or M13-like phage coats
or in the case of a phagemid-display vector packaged by infecting the fresh Escherichia coli host with M13 type helper phages (superinfection) and
phage-display or phagemid-display extension libraries are formed.
Another embodiment of the invention concerns a method for the reassortment of the 5xe2x80x2- and/or 3xe2x80x2-extensions in the production of large recombinant
phage-display extension libraries or
phagemid-display extension libraries,
xe2x80x83comprising the sequence as defined before wherein recombination takes place at one or the other, or consecutively at both the cleavage site(s) ZZ bracketting the inserted cassette(s), wherein
(a) to (b) a double-stranded DNA prepared from Escherichia coli cells containing a display vector population, consisting of M13 phages or M13-like phages or consisting of phagemids as display vectors as defined before; a cosmid vector; a restriction enzyme for cut (B); and restriction enzyme for cut (A) are selected, wherein
(i) the cut (B) enzyme cleaves the display vectors in the region encoding the displayed peptide or displayed protein and generates unique non-symmetrical cohesive ends at selectively either
the 5xe2x80x2-junction of extension and cassette (cleavage by the restriction enzyme recognizing the binding site B1 . . . Bn as defined before), or
at the 3xe2x80x2-junction of extension and cassette (cleavage by the restriction enzyme recognizing the binding site Qn+a+b+1 . . . Qn+a+b+j as defined before, or Qn+a+d+b+1 . . . Qn+a+d+b+j as defined before), wherein each cohesive end is a 2 bp single strand end formed by the two bases designated Z,
xe2x80x83(ii) the cut (A) enzyme cleaves the display vectors and the cosmid vector and generates upon cleavage unique non-symmetrical cohesive ends which differ from those resulting from cut (B),
(b) the display vectors are cleaved with the first restriction enzyme,
(c) the display vector and the cosmid vector are cleaved with the second restriction enzyme,
(e) the cleaved display vectors are ligated with the cleaved cosmid vectors forming concatemers,
(f) the ligation product is subjected to a lambda packaging and transduced into an Escherichia coli host,
(g) if wanted, selection is made for a gene present in the ligated display vectors,
(h) the transduced display vectors in the Escherichia host are
either in the case of a phage-display vector spontaneously packaged in M13 or M13-like phage coats
or in the case of phagemid-display vectors packaged by infecting the Escherichia coli host with an M13-type helper bacteriophage (superinfection),
(i) the packaged display vectors are passaged in a fresh Escherichia coli host and phage-display or phagemid-display libraries are formed and, if wanted
(j) the passaged display vectors are
either in the case of a phage-display vector spontaneously packaged in an M13 or M13-like phage coats
or in the case of a phagemid vector packaged by infecting the fresh Escherichia coli host with M13 type helper phages (superinfection) and
xe2x80x83phage-display or phagemid-display libraries are formed.
A specific embodiment of the invention concerns a method which is characterized in that in steps (a) to (b) a type IIS restriction enzyme is selected, preferably BgII, DraIII, BsgI or BpmI.
Further, a specific embodiment of the invention concerns a method which is characterized in that for cuts (i) and (ii) the same restriction site is selected.
Further, a specific embodiment of the invention concerns a method which is characterized in that as cut (B) enzyme and as cut (A) enzyme different enzymes are used, preferably BsgI or BpmI as cut (B) enzyme and DraIII as cut (A) enzyme (fd or M13 replication origin is cut).
Further, a specific embodiment of the invention concerns a method which is characterized in that in step (h) and facultatively in step (j) M13K07 is used as the M13-type helper phage.
Further, a specific embodiment of the invention concerns a method which is characterized in that in step (g) selection is made for the presence of an antibiotic resistance gene.
Further, a specific embodiment of the invention concerns a method which is characterized in that in step (i) the multiplicity of infection (MOI) is less than or equal to 1.
Further, a specific embodiment of the invention concerns a method wherein the cosmid comprises an fd or M13 bacteriophage origin.
Further, a specific embodiment of the invention concerns a method wherein in step (e) a mol ratio of display vectors to the cosmid vector within the range 3:1 to 15:1 and preferably 3:1 to 10:1 is used.
Further, a specific embodiment of the invention concerns a method wherein in step (e) a vector concentration (comprising display vectors and cosmid vectors) of more than 100 xcexcg DNA/ml is used.
Another embodiment of the invention concerns a method for the de novo production of large
phage-display libraries or
phagemid-display libraries,
xe2x80x83comprising DNA sequences as defined before, and subjectable to recombination according to a procedure as defined before, wherein recombination takes place within a DNA sequence as defined before, wherein
a) a display vector, consisting of an M13 phage or M13-like phage or consisting of a phagemid-display vector comprising a bacteriophage replication origin, facultatively a gene for a selectable marker, preferably an antibiotic resistance, a lambda bacteriophage cos-site and a xe2x80x9cstufferxe2x80x9d-sequence (FIG. 5 upper right), containing two binding sites for a type IIS restriction enzyme different from any of the enzymes as defined before (cut (B) and cut (A)), wherein said two sites are oriented in divergent orientation and where the cohesive ends generated on cleavage are non-symmetrical and differ from one another at the two sites, and
b) a PCR-generated fragment comprising part of one of the sequences as defined before, including a (the) hypervariable sequence(s), preferably Xn+1 . . . Xn+aZn+a+1Zn+a+2Xn+a+3 . . . Xn+a+b according to the invention, bracketted by the same type IIS restriction enzyme binding sites defined in (a), but in this case both oriented inwards towards the hypervariable sequence (FIG. 5 left side) and where on cleavage by this restriction enzyme two non-symmetrical, single strand ends different from one another are generated, where the first end (axe2x80x2 in FIG. 5) is complementary to one of the ends (a in FIG. 5) generated on the large vector fragment in (a) and the second end (bxe2x80x2 in FIG. 5) is complementary to the other end (b in FIG. 5) generated on the large vector fragment in (a),
c) the two cleavage reaction systems (a) and (b) still containing the active type IIS restriction enzyme are mixed together in approximately equimolar proportions and subjected to ligation in the presence of DNA ligase;
xe2x80x83fragments containing the restriction enzyme binding sites are constantly removed (xe2x80x9cstufferxe2x80x9d fragment and outer end of the PCR product) whereas
xe2x80x83the other two components, namely the large vector fragment and the insert sequence (central fragment from the PCR reaction) are driven to form
A) a concatameric hybrid if the ligation is carried out at  greater than 100 xcexcg DNA/ml (FIG. 5), or
B) a circular hybrid if the ligation is carried out at  less than  or=40 xcexcg DNA/ml,
d1) in the case of protocol A) the DNA is packaged into lambda particles and transduced into an Escherichia coli host,
d2) in the case of protocol B) the DNA is transformed in an Escherichia coli host,
e) if wanted, selection is made for a gene present in the ligated display vectors,
f) the transduced display vectors in the Escherichia coli host are
either in the case of a phage-display vector spontaneously packaged in M13 or M13-like phage coats
or in the case of phagemid-display vectors packaged by infecting the Escherichia coli host with an M13-type helper bacteriophage (superinfection),
(g) the packaged display vectors are passaged in a fresh Escherichia coli host and phage-display or phagemid-display libraries are formed and, if wanted
(h) the passaged display vectors are
either in the case of a phage-display vector spontaneously packaged in an M13 or M13-like phage coats
or in the case of a phagemid vector packaged by infecting the fresh Escherichia coli host with M13-type helper phages (superinfection) and
xe2x80x83phage-display or phagemid-display libraries are formed.
A specific embodiment of the invention concerns a method which is characterized in that in steps (a) to (b), as type IIS restriction enzyme, preferably BpiI, BsgI or BpmI is selected.
Further, a specific embodiment of the invention concerns a method which is characterized in that in step (f) and facultatively in step (h) M13K07 is used as the M13-type helper phage.
Further, a specific embodiment of the invention concerns a method which is characterized in that in step (e) selection is made for the presence of an antibiotic resistance gene.
Further, a specific embodiment of the invention concerns a method which is characterized in that in step (g) the multiplicity of infection (MOI) is less than or equal to 1.
Another embodiment of the invention concerns a phage-display library or a phagemid-display library in the form of packaged particles obtainable according to any of the methods as described before.
Another embodiment of the invention concerns a phage-display library or a phagemid-display library in the form of display vectors comprised by Escherichia coli population(s) obtainable according to any of the methods as described before.
Another embodiment of the invention concerns a phage-display libraries or phagemid libraries which are characterized by a gene (genes) as defined before and obtainable according to the invention, wherein the term xe2x80x9clargexe2x80x9d as used before is defined as in excess of 106 variant clones, preferentially 108 to 1011 variant clones.
Finally, another embodiment of the invention concerns a protein or peptide comprising a peptide sequence encoded by a DNA sequence as defined before and obtainable by affinity selection procedures on a defined target by means of libraries as defined before.
The invention pertains to a novel combination of recombinant DNA technologies to produce large hypervariable gene banks for the selection of novel ligands of pharmaceutical, diagnostic, biotechnological, veterinary, agricultural and biomedical importance with an efficiency higher than was hitherto attainable.
The size of the hypervariable gene bank is presently considered the most essential factor limiting the usefulness of the methodology for such purposes, since, as an empirical method, it depends on the diversity (number of different variants) initially generated in the bank (hypervariable gene library). In contrast to this traditional opinion we consider that, when a highly efficient method is developed, as presented here, to generate a large proportion of the possible combinations of mutated segments of the variants from a preselected subpopulation, a population enriched for the desired structural elements will be generated which would only have been represented in a population approaching Nx where N is the size of the original population and x is the number of segments to be recombined.
The first part of the invention pertains to novel sequences which allow recombination within hypervariable DNA sequences encoding regions (domains) variable peptides or proteins displayed in combinatorial phage/phagemid display libraries using type IIS restriction endonucleases both (a) to introduce a cut at the site of recombination and (b) to generate oriented substrates for a ligation reaction, where the ligation products are then recloned at high efficiency after in vitro packaging in a lambda packaging mix. The entire protocol yields efficiencies (clones per input DNA) in excess of any described technology ( greater than 108 clones per microgram ligated DNA).
Combinations of (vector) sequences and protocols are claimed for both the production of the initial libraries and for recombinational procedures to generate increased diversity within the library or a selected subpopulation at any time. In particular such sequences and procedures are claimed for the generation and use of phage/phagemid-display combinatorial libraries.
The inventors recognize that the main factor thereby determining the efficient generation of further variation is the efficient production of combinatorial libraries from the initial libraries, via reassortment of smaller elements (specific peptide sequences within the hypervariable region, and/or reassortment of structural domains) which contribute to the properties selected for. The invention presents such a method, which has the unique property that the recombination site may be within the hypervariable region whereby no restriction is imposed on the sequence within the hypervariable region involved. Alternatively the method can be used to reassort domains of proteins or subunits of heteromeric proteins (proteins composed of two or more different variant polypeptide chains), each of which can contain hypervariable regions, without resorting to recloning isolated DNA fragments or generating new libraries containing new synthetic oligonucleotides. It is noted that this method thus offers a saving in both time and materials when optimizing a structure for a predetermined property on the basis of a preselected clone population (subpopulation) and in view of the geometrical increase in possible variability offered may represent a qualitatively novel feature in that some rare structures may be obtainable only by the novel strategy described.
The method, we designate cosmix-plexing7, is based on the design of the cloning vectors, the inserts used and a combination of special recombinant DNA protocols, which in particular use i) cleavage of the phage/phagemid DNA with type IIS restriction enzymes, ii) subsequent ligation to concatamers which are iii) packaged in vitro with a lambda packaging system for iv) efficient transduction into E. coli strains, where they are then v) repackaged in vivo in filamentous phage coats. The use of cosmix-plexing7, so defined, on a heterogeneous phage/phagemid population generates an enormous increase in novel variants at any time during further experimentation, e.g. after any enrichment step for structures having the predetermined property or properties.
In particular subpopulations which are enriched from the original library for a specific property will be enriched for a consensus motif (a degenerate set of related sequences within the varied region(s) which all exhibit the required property to some extent) which may (probably will) include the optimal sequence in terms of the required property. Reassortment of these regions or portions of a single hypervariable sequence by cosmix-plexing7 will increase the probability of obtaining the optimal sequence. The subpopulations may be isolated by differential affinity-based selection on a defined target, or enrichment procedures based on other desired selectable properties (example 1: substrate properties such as phosphorylation by a particular protein kinase enriched by binding on antibodies which recognize the modified (in this case phosphorylated)substrate; or example 2: cleavage of the variant sequence by an endoprotease, using selective release of the phage or phagemid previously bound via an interaction between a terminal protein structure (anchor) and its ligand immobilized to, or later trapped on, a surface).
The invention further covers the generation of extension libraries in which e.g. a xe2x80x9cproject-specific cassettexe2x80x9d is inserted at the recombination site within the gene bank. Optimisation of ligands can then occur by the generation of further combinatorial libraries from selected clones in which the adjacent regions may be efficiently xe2x80x9cshuffledxe2x80x9d, either singly or both at a time. As far as we are aware no other system provides this xe2x80x9ccassettexe2x80x9d insertion/exchangexe2x80x9d feature.