The paradigm that the primary mechanism for governing the expression of genes involves protein switches that bind DNA in a sequence specific manner was established in 1967 (Ptashne, M. (1967) Nature (London) 214, 323-4). Diverse structural families of DNA binding proteins have been described. Despite a wealth of structural diversity, the Cys2-His2 zinc finger motif constitutes the most frequently utilized nucleic acid binding motif in eukaryotes. This observation is as true for yeast as it is for man. The Cys2-His2 zinc finger motif, identified first in the DNA and RNA binding transcription factor TFIIIA (Miller, J., McLachlan, A. D. & Klug, A. (1985) Embo J 4, 1609-14), is perhaps the ideal structural scaffold on which a sequence specific protein might be constructed. A single zinc finger domain consists of approximately 30 amino acids with a simple tua fold stabilized by hydrophobic interactions and the chelation of a single zinc ion (Miller, J., McLachlan, A. D. & Klug, A. (1985) Embo J 4,1609-14, Lee, M. S., Gippert, G. P., Soman, K. V., Case, D. A. & Wright, P. E. (1989) Science 245, 635-7). Presentation of the .alpha.-helix of this domain into the major groove of DNA allows for sequence specific base contacts. Each zinc finger domain typically recognizes three base pairs of DNA (Pavietich, N. P. & Pabo, C. O. (1991) Science (Washington, D.C., 1883-) 252, 809-17, Elrod-Erickson, M., Rould, M. A., Nekludova, L. & Pabo, C. O. (1996) Structure (London) 4, 1171-1180, Elrod-Erickson, M., Benson, T. E. & Pabo, C. O. (1998) Structure (London) 6, 451464, Kim, C. A. & Berg, J. M. (1996) Nature Structural Biology 3, 940-945), though variation in helical presentation can allow for recognition of a more extended site (Pavletich, N. P. & Pabo, C. O. (1993) Science (Washington, D.C., 1883-) 261, 1701-7, Houbaviy, H. B., Usheva, A., Shenk, T. & Burley, S. K. (1996) Proc Natl Acad Sci USA 93,13577-82, Fairall, L., Schwabe, J. W. R., Chapman, L., Finch, J. T. & Rhodes, D. (1993) Nature (London) 366, 483-7, Wuttke, D. S., Foster, M. P., Case, D. A., Goftesfeld, J. M. & Wright, P. E. (1997) J. Mol. Biol. 273, 183-206). In contrast to most transcription factors that rely on dimerization of protein domains for extending protein-DNA contacts to longer DNA sequences or addresses, simple covalent tandem repeats of the zinc finger domain allow for the recognition of longer asymmetric sequences of DNA by this motif. We have recently described polydactyl zinc finger proteins that contain 6 zinc finger domains and bind 18 base pairs of contiguous DNA sequence (Liu, Q., Segal, D. J., Ghiara, J. B. & Barbas III, C. F. (1997) PNAS 94, 5525-5530). Recognition of 18 bps of DNA is sufficient to describe a unique DNA address within all known genomes, a requirement for using polydactyl proteins as highly specific gene switches. Indeed, control of both gene activation and repression has been shown using these polydactyl proteins in a model system (Liu, Q., Segal, D. J., Ghiara, J. B. & Barbas III, C. F. (1997) PNAS 94,5525-5530).
Since each zinc finger domain typically binds three base pairs of sequence, a complete recognition alphabet requires the characterization of 64 domains. Existing information which could guide the construction of these domains has come from three types of studies: structure determination (Pavietich, N. P. & Pabo, C. O. (1991) Science (Washington, D.C., 1883) 252, 809-17, Elrod-Erickson, M., Rould, M. A., Nekludova, L. & Pabo, C. O. (1996) Structure (London) 4,1171-1180, Elrod-Erickson, M., Benson, T. E. & Pabo, C. O. (1998) Structure (London) 6, 451-464, Kim, C. A. & Berg, J. M. (1996) Nature Structural Biology 3, 940-945, Pavletich, N. P. & Pabo, C. O. (1993) Science (Washington, D.C., 1883-) 261, 1701-7, Houbaviy, H. B., Usheva, A., Shenk, T. & Burley, S. K. (1996) Proc Natl Acad Sci U.S.A. 93,13577-82, Fairall, L., Schwabe, J. W. R., Chapman, L., Finch, J. T. & Rhodes, D. (1993) Nature (London) 366, 483-7.,1 1, Wuttke, D. S., Foster,. M. P., Case, D. A., Gottesfeld, J. M. & Wright, P. E. (1997) J. Mol. Biol. 273, 183-206., Nolte, R. T., Conlin, R. M., Harrison, S. C. & Brown, R. S. (1998) Proc. Natl. Acad. Sci. U.S.A. 95,2938-2943, Narayan, V. A., Kriwacki, R. W. & Caradonna, J. P. (1997) J. Biol. Chem. 272, 7801-7809., site-directed mutagenesis (Isalan, M., Choo, Y. & Klug, A. (1997) Proc. Natl. Acad. Sci. U. S. A. 94, 5617-5621, Nardelli, J., Gibson, T. J., Vesque, C. & Charnay, P. (1991) Nature 349, 175-178, Nardelli, J., Gibson, T. & Charnay, P. (1992) Nucleic Acids Res. 20, 413744, Taylor, W. E., Suruki, H. K., Lin, A. H. T., Naraghi-Arani, P., Igarashi, R. Y., Younessian, M., Katkus, P. & Vo, N. V. (1995) Biochemistry 34, 3222-3230, Desjarlais, J. R. & Berg, J. M. (1992) Proteins: Struct., Funct., Genet. 12, 1014, Desjarlais, J. R. & Berg, J. M. (1992) Proc Natl Acad Sci USA 89, 7345-9), and phage-display selections (Choo, Y. & Klug, A. (1994) Proc Natl Acad Sci USA 91, 11163-7, Greisman, H. A. & Pabo, C. O. (1997) Science (Washington, D.C.) 275,657-661.23, Rebar, E. J. & Pabo, C. O. (1994) Science (Washington, D.C., 1883-) 263, 671-3, Jamieson, A. C., Kim, S.-H. & Wells, J. A. (1994) Biochemistry 33, 5689-5695, Jamieson, A. C., Wang, H. & Kim, S.-H. (1996) PNAS 93, 12834-12839, Isalan, M., Klug, A. & Choo, Y. (1998) Biochemistry 37, 12026-33, Wu, H., Yang, W.-P. & Barbas III, C. F. (1995) PNAS 92, 344-348). All have contributed significantly to our understanding of zinc finger/DNA recognition, but each has its limitations. Structural studies have identified a diverse spectrum of protein/DNA interactions but do not explain if alternative interactions might be more optimal. Further, while interactions that allow for sequence specific recognition are observed, little information is provided on how alternate sequences are excluded from binding. These questions have been partially addressed by mutagenesis of existing proteins, but the data is always limited by the number of mutants that can be characterized. Phage-display and selection of randomized libraries overcomes certain numerical limitations, but providing the appropriate selective pressure to ensure that both specificity and affinity drive the selection is difficult. Experimental studies from several laboratories (Choo, Y. & Klug, A. (1994) Proc Natl Acad Sci USA 91, 11163-7, Greisman, H. A. & Pabo, C. O. (1997) Science (Washington, D.C.) 275, 657-661, Rebar, E. J. & Pabo, C. O. (1994) Science (Washington, D.C., 1883-) 263, 671-3, Jamieson, A. C., Kim, S.-H. & Wells, J. A. (1994) Biochemistry 33, 5689-5695.25, Jamieson, A. C., Wang, H. & Kim, S.-H. (1996) PNAS 93, 12834-12839, Isalan, M., Klug, A. & Choo, Y. (1998) Biochemistry 37,12026-33), including our own (Wu, H., Yang, W.-P. & Barbas III, C. F. (1995) PNAS 92, 344-348), have demonstrated that it is possible to design or select a few members of this recognition alphabet. However, the specificity and affinity of these domains for their target DNA was rarely investigated in a rigorous and systematic fashion in these early studies.
Since Jacob and Monod questioned the chemical nature of the repressor and proposed a scheme by which the synthesis of individual proteins within a cell might be provoked or repressed, specific experimental control of gene expression has been a tantalizing prospect (Jacob, F. & Monod, J. (1961) J. Mol. Biol. 3, 318-356). It is now well established that genomes are regulated at the level of transcription primarily through the action of proteins known as transcription factors that bind DNA in a sequence specific fashion. Often these protein factors act in a complex combinatorial manner allowing temporal, spatial, and environmentally-responsive control of gene expression (Ptashne, M. (1997) Nature Medicine 3, 1069-1072). Transcription factors frequently act both through a DNA-binding domain which localizes the protein to a specific site within the genome, and through accessory effector domains which act to provoke (activate) or repress transcription at or near that site (Cowell, I. G. (1994) Trends Biochem. Sci. 19, 3842). Effector domains, such as the activation domain VP16 (Sadowski, I., Ma, J., Triezenberg, S. & Ptashne, M. (1988) Nature 335, 563-564) and the repression domain KRAB (Margolin, J. F., Friedman, J. R., Meyer, W., K.-H., Vissing, H., Thiesen, H.-J. & Rauscher III, F. J. (1994) Proc. Nat. Acad. Sci. USA 91, 4509-4513), are typically modular and retain their activity when they are fused to other DNA-binding proteins. Whereas genes might be readily controlled by directing transcription factors to particular sites within a genome, the design of DNA binding proteins that might be fashioned to bind any given sequence has been a daunting challenge. The present disclosure is based on the recognition of the structural features unique to the Cys2-His2 class of nucleic acid-binding, zinc finger proteins. The Cys2-His2 zinc finger domain consists of a simple Ada fold of approximately 30 amino acids in length. Structural stability of this fold is achieved by hydrophobic interactions and by chelation of a single zinc ion by the conserved Cys2-His2 residues (Lee, M. S., Gippert, G. P., Soman, K. V., Case, D. A. & Wright, P. E. (1989) Science 245, 635-637). Nucleic acid recognition is achieved through specific amino acid side chain contacts originating from the α-helix of the domain, which typically binds three base pairs of DNA sequence (Pavletich, N. P. & Pabo, C. O. (1991) Science 252, 809-17, Elrod-Erickson, M., Rould, M. A., Nekludova, L. & Pabo, C. O. (1996) Structure 4, 1171-1180). Unlike other nucleic acid recognition motifs, simple covalent linkage of multiple zinc finger domains allows the recognition of extended asymmetric sequences of DNA. Studies of natural zinc finger proteins have shown that three zinc finger domains can bind 9 bp of contiguous DNA sequence (Pavletich, N. P. & Pabo, C. O. (1991) Science 252, 809-17., Swirnoff, A. H. & Milbrandt, J. (1995) Mol. Cell. Biol. 15, 2275-87). Whereas recognition of 9 bp of sequence is insufficient to specify a unique site within even the small genome of E. coli, polydactyl proteins containing six zinc finger domains can specify 18-bp recognition (Liu, Q., Segal, D. J., Ghiara, J. B. & Barbas III, C. F. (1997) Proc. Natl. Acad. Sci. USA 94, 5525-5530). With respect to the development of a universal system for gene control, an 18-bp address can be sufficient to specify a single site within all known genomes. While polydactyl proteins of this type are unknown in nature, however, their efficacy in gene activation and repression within living human cells has recently been shown (Liu, Q., Segal, D. J., Ghiara, J. B. & Barbas III, C. F. (1997) Proc. Natl. Acad. Sci. USA 94, 5525-5530).