The construction of artificial transcription factors has been of great interest in the past years. Gene expression can be specifically regulated by polydactyl zinc finger proteins fused to regulatory domains. Zinc finger domains of the Cys2-His2 family have been most promising for the construction of artificial transcription factors due to their modular structure. Each domain consists of approximately 30 amino acids and folds into an α-helical structure stabilized by hydrophobic interactions and chelation of a zinc ion by the conserved Cys2-His2 residues. To date, the best characterized protein of this family of zinc finger proteins is the mouse transcription factor Zif 268 [Pavletich et al., (1991) Science 252(5007), 809-817; Elrod-Erickson et al., (1996) Structure 4(10), 1171-1180]. The analysis of the Zif 268/DNA complex suggested that DNA binding is predominantly achieved by the interaction of amino acid residues of the α-helix in position −1, 3, and 6 with the 3′, middle, and 5′ nucleotide of a 3 bp DNA subsite, respectively. Positions 1, 2 and 5 have been shown to make direct or water-mediated contacts with the phosphate backbone of the DNA. Leucine is usually found in position 4 and packs into the hydrophobic core of the domain. Position 2 of the α-helix has been shown to interact with other helix residues and, in addition, can make contact to a nucleotide outside the 3 bp subsite [Pavletich et al., (1991) Science 252(5007), 809-817; Elrod-Erickson et al., (1996) Structure 4(10), 1171-1180; Isalan, M. et al., (1997) Proc Natl Acad Sci USA 94(11), 5617-5621].
The selection of modular zinc finger domains recognizing each of the 5′-(GNN)-3′ DNA subsites with high specificity and affinity and their refinement by site-directed mutagenesis has been demonstrated (U.S. Pat. No. 6,140,081, the disclosure of which is incorporated herein by reference). These modular domains can be assembled into zinc finger proteins recognizing extended 18 bp DNA sequences which are unique within the human genome or any other genome. In addition, these proteins function as transcription factors and are capable of altering gene expression when fused to regulatory domains and can even be made hormone-dependent by fusion to ligand-binding domains of nuclear hormone receptors. To allow the rapid construction of zinc finger-based transcription factors binding to any DNA sequence it is important to extend the existing set of modular zinc finger domains to recognize each of the 64 possible DNA triplets which are assigned meaning in the genetic code. This aim can be achieved by phage display selection and/or rational design. Due to the limited structural data on zinc finger/DNA interaction, rational design of zinc proteins is very time-consuming and may not be possible in many instances. In addition, most naturally occurring zinc finger proteins consist of domains recognizing the 5′-(GNN)-3′ type of DNA sequences. The most promising approach to identify novel zinc finger domains binding to DNA target sequences of the type 5′-(NNN)-3′ is selection via phage display. The limiting step for this approach is the construction of libraries that allow the specification of a 5′ adenine, cytosine or thymine in the subsite recognized by each module. Phage display selections have been based on Zif268 in which different fingers of this protein were randomized [Choo et al., (1994) Proc. Natl. Acad. Sci. U.S.A. 91(23), 11168-72; Rebar et al., (1994) Science (Washington, D.C., 1883-) 263(5147), 671-3; Jamieson et al., (1994) Biochemistry 33, 5689-5695; Wu et al., (1995) PNAS 92, 344-348; Jamieson et al., (1996) Proc Natl Acad Sci USA 93, 12834-12839; Greisman et al., (1997) Science 275(5300), 657-661]. A set of 16 domains recognizing the 5′-(GNN)-3′ type of DNA sequences has previously been reported from a library where finger 2 of C7, a derivative of Zif268 [Wu et al., (1995) PNAS 92, 344-348 Wu, 1995], was randomized [Segal et al., (1999) Proc Natl Acad Sci USA 96(6), 2758-2763]. In such a strategy, selection is limited to domains recognizing 5′-(GNN)-3′ or 5′-(TNN)-3′ due to the Asp2 of finger 3 making contact with the complementary base of a 5′ guanine or thymine in the finger-2 subsite [Pavletich et al., (1991) Science 252(5007), 809-817; Elrod-Erickson et al., (1996) Structure 4(10), 1171-1180].
Despite the possible selection of zinc finger domains recognizing sequences of the form 5′-(TNN)-3′ by the strategy described above, in practice very few such sequences have been selected and identified. Therefore, there is a need to discover zinc finger domains recognizing sequences of the form 5′-(TNN)-3′ so that a broader “vocabulary” of zinc finger domains is available for the construction of multifinger zinc finger proteins. The availability of zinc finger domains recognizing sequences of the form 5′-(TNN)-3′ would lead to the ability to prepare artificial transcription factors and proteins having other nucleic acid sequence recognizing functions that recognize a far greater variety of nucleic acid sequences. The ability to specifically recognize sequences of the form 5′-(TNN)-3′ is particularly important because the major stop codons, TGA, TAG, and TAA, are of this form and regulatory sequences are frequently located in close proximity to chain termination regions. Additionally, the stop codons are frequently found in tandem in naturally occurring DNA and it would be desirable to target these regions. The scarcity of zinc finger domains recognizing sequences of the form 5′-(TNN)-3′ has made this very difficult.
The present approach is based on the modularity of zinc finger domains that allows the rapid construction of zinc finger proteins by the scientific community and demonstrates that the concerns regarding limitation imposed by cross-subsite interactions only occurs in a limited number of cases. The present disclosure introduces a new strategy for selection of zinc finger domains specifically recognizing the 5′-(TNN)-3′ type of DNA sequences. Specific DNA-binding properties of these domains were evaluated by a multi-target ELISA against all sixteen 5′-(CNN)-3′ triplets. These domains can be readily incorporated into polydactyl proteins containing various numbers of 5′-(TNN)-3′ domains, each specifically recognizing extended 18 bp sequences. Furthermore, these domains can specifically alter gene expression when fused to regulatory domains. These results underline the feasibility of constructing polydactyl proteins from predefined building blocks. In addition, the domains characterized here greatly increase the number of DNA sequences that can be targeted with artificial transcription factors.