Protein-nucleic acid recognition is a commonplace phenomenon which is central to a large number of biomolecular control mechanisms which regulate the functioning of eukaryotic and prokaryotic cells. For instance, protein-DNA interactions form the basis of the regulation of gene expression and are thus one of the subjects most widely studied by molecular biologists. Many DNA-binding proteins contain independently folded domains for the recognition of DNA, and these domains in turn belong to a large number of structural families, such as the leucine zipper, the “helix-turn-helix” and zinc finger families. Despite the great variety of structural domains, the specificity of the interactions observed to date between protein and DNA most often derives from the complementarity of the surfaces of a protein α-helix and the major groove of DNA (Klug, 1993, Gene 135:83-92).
Zinc finger proteins are ubiquitous eukaryotic DNA-binding modules first identified in Xenopus transcription factor IIIA (TFIIIA). Each zinc finger protein consists of a number of autonomous DNA binding units. For example, the mouse Zif268 zinc finger protein is a protein of 90 amino acid residues belonging to the Cys2-His2 zinc family. Zif268 contains three independent zinc finger domains of 24 residues each. Each zinc finger domain (“finger”) consists of a single α helix joined to two strands of antiparallel β-sheets and held together via chelation of a zinc ion (Pavletich and Pabo, 1991, Science 252, 809-817). Sequence-specific DNA binding is mediated by residues located on the exposed face of the αhelix, which interacts with the major groove of DNA. One zinc finger domain interacts with about three base pairs, so that a number of fingers, which are linked together by linkers, are required to bind a longer DNA sequence. The linkers of various zinc finger proteins have been compared, and a consensus sequence (the “canonical sequence”) determined, consisting of four amino acids Gly-Glu-Lys-Pro (SEQ ID NO:56). This canonical linker is termed the “GEKP linker”. However, variants of this sequence are possible, for example, Gly-Gln-Lys-Pro (SEQ ID NO:58), Gly-Glu-Arg-Pro (SEQ ID NO:57) and Gly-Gln-Arg-Pro (SEQ ID NO:59).
It has been suggested that the contacts between particular amino acids and DNA base sequence may be described by a simple set of rules. However, current methods for the design and selection of zinc finger modules are not generally capable of producing zinc finger proteins that are capable of binding to any given DNA sequence. This is because certain nucleotide sequences will constitute favourable binding sites for zinc finger binding. It is known, for example, that DNA sequences which contain G-rich regions are highly specific binding sites for zinc finger proteins. In particular, zinc fingers tend to bind DNA sequences which contain G at every third position with high specificity. On the other hand, with regard to other sequences it will be difficult or impossible to design zinc fingers which bind specifically to that sequence. Thus, for example, pyrimidine-rich DNA sequences comprise less favourable binding sites for zinc fingers. In order to increase the affinity and specificity of binding, it is therefore desirable to construct zinc fingers which will tolerate gaps between the nucleotide sequences which are contacted by the fingers.
It is known in the prior art to attempt to increase affinity and specificity of zinc finger binding by linking together separate zinc finger domains with a canonical sequence. Thus, Rebar (1997, PhD Thesis, Massachusetts Institute of Technology, Massachusetts, USA) and Shi (1995, PhD Thesis, Johns Hopkins University, Maryland, USA) describe linking additional fingers to a three-finger protein using a GERP linker, and observe a relatively modest increase in affinity. Furthermore, tandem linkage of two three-finger proteins using a canonical linker has been described by Liu et al (1997), Proc. Natl. Acad. Sci. USA 94, 5525-5530. The affinity of binding of this six finger protein is found to be increased approximately 68-74 fold relative to each three-finger peptide, which is a poor result compared to that predicted by theory. A different approach is described by Kim and Pabo (1998, Proc. Natl. Acad. Sci. USA 95, 2812-2817), who-use structure based design to generate a six-finger construct, using flexible linkers comprising 8 or 11 amino acids to link two three finger peptides (Zif268 and NRE). However, this construct is only capable of spanning a single gap (comprising 0-2 base pairs) in the composite DNA target site. Structure based design has also been used to construct a fusion protein consisting of zinc fingers from Zif268 and the homeodomain from Oct-1 (Pomerantz et al., 1995, Science 267, 93-6). Thus, in summary, to date, several groups have created six (or nine)-finger fusion peptides to bind long stretches of DNA with high affinity (Kim, J-S. & Pabo, C, O. (1998) Proc. Natl. Acad. Sci. USA 95, 2812-2817; Liu, Q., Segal, D. J., Ghiara, J. B. & Barbas, C. F. III (1997) Proc. Natl. Acad. Sci. USA 94, 5525-5530; Kamiuchi, T., Abe, E., Imanishi, M., Kaji, T., Nagaoka, M. & Sugiura, Y. (1998) Biochemistry 37, 13827-13834). However, the affinities of these constructs vary greatly and have generally been far weaker than expected. In addition, all of these peptides have targeted either contiguous DNA sequences, or those containing just one or two nucleotides of unbound DNA.
It is therefore an object of the present invention to provide nucleic acid binding polypeptides which are capable of spanning longer gaps between DNA binding subsites. It is a further object of the invention to provide nucleic acid binding polypeptides which are capable of spanning a greater number of gaps between the DNA binding subsites. It is a yet further object of the invention to provide nucleic acid binding polypeptides which are capable of spanning variable gaps between DNA binding subsites.