In this application, we describe the generation of binding proteins and peptides using nucleic acid containing introns with RNA splice sites such as self-splicing introns, preferably in conjunction with a site-specific recombination system, such as lox P (Hoess et al Proc. Natl. Acad. Sci. USA 79 3398-3402, 1982; Sternberg et al J. Mol. Biol. 150 467-486, 1981). The site-specific recombination allows two sequences of nucleic acid to be cloned separately as libraries and be brought together subsequently by a recombination event (Waterhouse et al Nucleic Acids Res. 21 2265-2266, 1993; A. D. Griffiths et al. EMBO J. in press; WO 92/20791; WO 93/19172. One library of sequence is cloned into a first replicon and a second library of sequences into a second replicon. Recombination between the sites brings together libraries of both sequences on the same replicon. This recombination can be performed in vivo e.g. by P1 infection or by using a recombinase encoded by a plasmid in E.coli or in vitro using soluble recombinase. For lox P, the recombinase is Cre. This allows a large library to be made where the limitation is not the cloning efficiency but rather the number of cells which can be grown. Thus the method is particularly powerful in combination with phage display technology which allows the selection of proteins with desired binding properties from a large library of displayed proteins (WO 92/01047; WO 92/20791; WO 93/06213; WO 93/11236; WO 93/19172; WO94/13804). The size of the library is significant for ability to select antibodies or other binding proteins of appropriate affinity and specificity.
WO 93/19172 describes recombining two libraries of nucleic acid using a site-specific e.g. lox P, system mainly to code for heterodimeric proteins in which two chains encoded by distinct (separate) nucleic acid sequences associate to form a functional binding site. Also described is the bringing together of two polypeptides for continuous open reading frames. However, this imposes the use of an amino acid sequence encoded in the site-specific recombination sequence at the junction between the two parts of the sequence, for instance the linker in single chain Fv molecules. A problem with this is that there is only one open reading frame in the lox P sequence and the amino acids encoded by this may be incompatible with the expression of many proteins in functional form. If alternative lox P sites to the wild-type are used (eg see FIG. 4), further different amino acid sequences may be generated, but the possibilities are still restricted.
For instance, functional single chain Fv molecules can be constructed with 15 amino acid linkers encoded in part by the loxP recombination site. The length of the loxP site (34 bp) however means that a minimum of 11 heterologous ("foreign") amino acids must be incorporated into the final expressed protein. This makes the incorporation of a loxP site into a continuous reading frame unsuitable for the construction of a diabody repertoire and also leaves little scope for the modification of scFv linkers to enhance expression.
The present invention involves RNA splicing, particularly the use of self-splicing introns. This allows the recombination site to be inserted within the intron so that amino acids encoded by nucleotides which are spliced out are not incorporated into the final expressed protein. In such circumstances, the only "foreign" amino-acids which need be incorporated are those derived from the sequences at either end of the self-splicing intron. (Note: the amino acid composition and sequence of the product can be engineered with precision and amino acids inserted, substituted or deleted according to choice and using techniques known in the art.)
When a self-splicing intron is used, the amino acids that are incorporated derive from the P1 sequence at the 5' splice site (5'SS) and the P10 sequence at the 3' splice site (3'SS). These pair with the internal guiding sequence of the intron to form hairpin loops (FIG. 1) and splicing then occurs as indicated.
The use of self-splicing introns allows the use of recombination by lox P to be extended to construction of large libraries of contiguous polypeptide chains where the two parts of the chain separated by the intron are varied.
In the application EP 93303614.7, priority from which is claimed by PCT/GB93/02492, an example is given of use of a loxP site inserted within a self-splicing intron with a bivalent or bispecific "diabody". A "diabody" is a multivalent or multispecific multimer (e.g. bivalent or bispecific dimer) of polypeptides wherein each polypeptide in the multimers comprises a first domain comprising a binding portion of an immunoglobulin heavy chain variable region linked to a second domain which comprises a binding protein of an immunoglobulin light chain variable region such that the domain of a given polypeptide cannot associate with each other to form an antigen binding site. Antigen binding sites are formed from an antigen binding site. Antigen binding sites are formed by multimerisation (e.g. dimerisation) of the polypeptides.
The expression of bivalent diabodies from DNA containing a self-splicing intron is shown in FIGS. 1 and 2. Application EP 93303614.7 also shows the use of this system for chain-shuffling. (See also FIG. 3.) WO94/13804 describes splicing out a lox P site using a self-splicing intron for a bispecific diabody (Example 1 of this application). In these two earlier applications the use of self-splicing introns was described for splicing only between the two domains of diabodies. The use of self-splicing introns to bring together two portions of polypeptide chain however has general applicability and can equally well be applied to single chain Fv fragments, peptide libraries or indeed any polypeptide sequence.
The use of systems such as lox P which promote recombination allows one polypeptide sequence to be replaced by another one with a similar or different function, originally encoded on another replicon. This is particularly useful with polypeptide chains such as single chain Fvs which have two or more domains which contribute to function. The invention allows the use of two repertoires of nucleic acid, with a splice site between the two repertoires and proteins or peptides thus encoded selected. In one embodiment, termed "chain shuffling", one nucleic acid sequence is kept constant and the library of other chains recombined at the lox P site in the intron.
Self-splicing introns have been shown to be functional in E. coli using a system in which the Tetrahymena intervening sequence (a group I self-splicing intron) was inserted into the gene encoding the .alpha.-peptide of .beta.-galactosidase (J. V. Price & T. R. Cech Science 228 719-722, 1985; R. B. Waring et al Cell 40 371-380, 1985; M. D. Been & T. R. Cech Cell 47 207-216, 1986). The presence of blue colonies indicated that self-slicing was functional in E. coli., because the .alpha.-peptide complemented the .beta.-galactosidase enzyme acceptor. This system has been used in diagnosis of the intron sequences which are compatible with self-splicing.
Although self-splicing introns have been inserted into functional proteins as above splicing introns have not been used for protein engineering strategies or for processes which involve the recombination of two repertories of nucleic acid.