The field of this invention is zinc finger protein binding to target nucleotides. More particularly, the present invention pertains to amino acid residue sequences within the xcex1-helical domain of zinc fingers that specifically bind to target nucleotides of the formula 5xe2x80x2-(GNN)-3xe2x80x2.
The paradigm that the primary mechanism for governing the expression of genes involves protein switches that bind DNA in a sequence specific manner was established in 1967 (Ptashne, M. (1967) Nature (London) 214, 3234). Diverse structural families of DNA binding proteins have been described. Despite a wealth of structural diversity, the Cys2-His2 zinc finger motif constitutes the most frequently utilized nucleic acid binding motif in eukaryotes. This observation is as true for yeast as it is for man. The Cys2-His2 zinc finger motif, identified first in the DNA and RNA binding transcription factor TFIIIA (Miller, J., McLachlan, A. D. and Klug, A. (1985) Embo J 4, 1609-14), is perhaps the ideal structural scaffold on which a sequence specific protein might be constructed. A single zinc finger domain consists of approximately 30 amino acids with a simple xcex2xcex2xcex1 fold stabilized by hydrophobic interactions and the chelation of a single zinc ion (Miller, J., McLachlan, A. D. and Klug, A. (1985) Embo J 4, 1609-14, Lee, M. S., Gippert, G. P., Soman, K. V., Case, D. A. and Wright, P. E. (1989) Science 245, 635-7). Presentation of the xcex1-helix of this domain into the major groove of DNA allows for sequence specific base contacts. Each zinc finger domain typically recognizes three base pairs of DNA (Pavletich, N. P. and Pabo, C. O. (1991) Science (Washington, D.C., 1883-) 252, 809-17, Elrod-Erickson, M., Rould, M. A., Nekludova, L. and Pabo, C. O. (1996) Structure (London) 4, 1171-1180, Elrod-Erickson, M., Benson, T. E. and Pabo, C. O. (1998) Structure (London) 6, 451-464, Kim, C. A. and Berg, J. M. (1996) Nature Structural Biology 3, 940-945), though variation in helical presentation can allow for recognition of a more extended site (Pavletich, N. P. and Pabo, C. O. (1993) Science (Washington, D.C., 1883-) 261, 1701-7, Houbaviy, H. B., Usheva, A., Shenk, T. and Burley, S. K. (1996) Proc Natl Acad Sci USA 93, 13577-82, Fairall, L., Schwabe, J. W. R., Chapman, L., Finch, J. T. and Rhodes, D. (1993) Nature (London) 366, 483-7, Wuttke, D. S., Foster, M. P., Case, D. A., Gottesfeld, J. M. and Wright, P. E. (1997) J. Mol. Biol. 273, 183-206). In contrast to most transcription factors that rely on dimerization of protein domains for extending protein-DNA contacts to longer DNA sequences or addresses, simple covalent tandem repeats of the zinc finger domain allow for the recognition of longer asymmetric sequences of DNA by this motif.
We have recently described polydactyl zinc finger proteins that contain 6 zinc finger domains and bind 18 base pairs of contiguous DNA sequence (Liu, Q., Segal, D. J., Ghiara, J. B. and Barbas III, C. F. (1997) PNAS 94, 5525-5530). Recognition of 18 bps of DNA is sufficient to describe a unique DNA address within all known genomes, a requirement for using polydactyl proteins as highly specific gene switches. Indeed, control of both gene activation and repression has been shown using these polydactyl proteins in a model system (Liu, Q., Segal, D. J., Ghiara, J. B. and Barbas III, C. F. (1997) PNAS 94, 5525-5530).
Since each zinc finger domain typically binds three base pairs of sequence, a complete recognition alphabet requires the characterization of 64 domains. Existing information which could guide the construction of these domains has come from three types of studies: structure determination (Pavietich, N. P. and Pabo, C. O. (1991) Science (Washington, D.C., 1883-) 252, 809-17, Elrod-Erickson, M., Rould, M. A., Nekludova, L. and Pabo, C. O. (1996) Structure (London) 4, 1171-1180, Elrod-Erickson, M., Benson, T. E. and Pabo, C. O. (1998) Structure (London) 6, 451-464, Kim, C. A. and Berg, J. M. (1996) Nature Structural Biology 3, 940-945, Pavletich, N. P. and Pabo, C. O. (1993) Science (Washington, D.C., 1883-) 261, 1701-7, Houbaviy, H. B., Usheva, A., Shenk, T. and Burley, S. K. (1996) Proc Natl Acad Sci USA 93, 13577-82, Fairall, L., Schwabe, J. W. R., Chapman, L., Finch, J. T. and Rhodes, D. (1993) Nature (London) 366, 483-7., 11, Wuttke, D. S., Foster, M. P., Case, D. A., Gottesfeld, J. M. and Wright, P. E. (1997) J. Mol. Biol. 273, 183-206., Nolte, R. T., Conlin, R. M., Harrison, S. C. and Brown, R. S. (1998) Proc. Natl. Acad. Sci. U.S.A. 95, 2938-2943, Narayan, V. A., Kriwacki, R. W. and Caradonna, J. P. (1997) J. Biol. Chem. 272, 7801-7809., site-directed mutagenesis (Isalan, M., Choo, Y. and Klug, A. (1997) Proc. Natl. Acad. Sci. U.S.A. 94, 5617-5621, Nardelli, J., Gibson, T. J., Vesque, C. and Charnay, P. (1991) Nature 349, 175-178, Nardelli, J., Gibson, T. and Charnay, P. (1992) Nucleic Acids Res. 20, 4137.-44, Taylor, W. E., Suruki, H. K., Lin, A. H. T., Naraghi-Arani, P., Igarashi, R. Y., Younessian, M., Katkus, P. and Vo, N. V. (1995) Biochemistry 34, 3222-3230, Desjarlais, J. R. and Berg, J. M. (1992) Proteins: Struct., Funct., Genet. 12, 101-4, Desjarlais, J. R. and Berg, J. M. (1992) Proc Natl Acad Sci USA 89, 7345-9), and phage-display selections (Choo, Y. and Klug, A. (1994) Proc Natl Acad Sci USA 91, 11163-7, Greisman, H. A. and Pabo, C. O. (1997) Science (Washington, D.C.) 275, 657-661.23, Rebar, E. J. and Pabo, C. O. (1994) Science (Washington, D.C., 1883-) 263, 671-3, Jamieson, A. C., Kim, S.-H. and Wells, J. A. (1994) Biochemistry 33, 5689-5695, Jamieson, A. C., Wang, H. and Kim, S.-H. (1996) PNAS 93, 12834-12839, Isalan, M., Klug, A. and Choo, Y. (1998) Biochemistry 37, 12026-33, Wu, H., Yang, W.-P. and Barbas III, C. F. (1995) PNAS 92, 344-348). All have contributed significantly to our understanding of zinc finger/DNA recognition, but each has its limitations. Structural studies have identified a diverse spectrum of protein/DNA interactions but do not explain if alternative interactions might be more optimal. Further, while interactions that allow for sequence specific recognition are observed, little information is provided on how alternate sequences are excluded from binding. These questions have been partially addressed by mutagenesis of existing proteins, but the data is always limited by the number of mutants that can be characterized. Phage-display and selection of randomized libraries overcomes certain numerical limitations, but providing the appropriate selective pressure to ensure that both specificity and affinity drive the selection is difficult. Experimental studies from several laboratories (Choo, Y. and Klug, A. (1994) Proc Natl Acad Sci USA 91, 11163-7, Greisman, H. A. and Pabo, C. O. (1997) Science (Washington, D.C.) 275, 657-661, Rebar, E. J. and Pabo, C. O. (1994) Science (Washington, D.C., 1883-) 263, 671-3, Jamieson, A. C., Kim, S.-H. and Wells, J. A. (1994) Biochemistry 33, 5689-5695.25, Jamieson, A. C., Wang, H. and Kim, S.-H. (1996) PNAS 93, 12834-12839, Isalan, M., Klug, A. and Choo, Y. (1998) Biochemistry 37, 12026-33), including our own (Wu, H., Yang, W.-P. and Barbas III, C. F. (1995) PNAS 92, 344-348), have demonstrated that it is possible to design or select a few members of this recognition alphabet. However, the specificity and affinity of these domains for their target DNA was rarely investigated in a rigorous and systematic fashion in these early studies.
Since Jacob and Monod questioned the chemical nature of the repressor and proposed a scheme by which the synthesis of individual proteins within a cell might be provoked or repressed, specific experimental control of gene expression has been a tantalizing prospect (Jacob, F. and Monod, J. (1961) J. Mol. Biol. 3, 318-356). It is now well established that genomes are regulated at the level of transcription primarily through the action of proteins known as transcription factors that bind DNA in a sequence specific fashion. Often these protein factors act in a complex combinatorial manner allowing temporal, spatial, and environmentally-responsive control of gene expression (Ptashne, M. (1997) Nature Medicine 3, 1069-1072). Transcription factors frequently act both through a DNA-binding domain which localizes the protein to a specific site within the genome, and through accessory effector domains which act to provoke (activate) or repress transcription at or near that site (Cowell, I. G. (1994) Trends Biochem. Sci. 19, 3842). Effector domains, such as the activation domain VP16 (Sadowski, I., Ma, J., Triezenberg, S. and Ptashne, M. (1988) Nature 335, 563-564) and the repression domain KRAB (Margolin, J. F., Friedman, J. R., Meyer, W., K.-H., Vissing, H., Thiesen, H.-J. and Rauscher III, F. J. (1994) Proc. Natl. Acad. Sci. USA 91, 4509-4513), are typically modular and retain their activity when they are fused to other DNA-binding proteins. Whereas genes might be readily controlled by directing transcription factors to particular sites within a genome, the design of DNA binding proteins that might be fashioned to bind any given sequence has been a daunting challenge.
The present disclosure is based on the recognition of the structural features unique to the Cys2-His2 class of nucleic acid-binding, zinc finger proteins. The Cys2-His2 zinc finger domain consists of a simple xcex2xcex2xcex1 fold of approximately 30 amino acids in length. Structural stability of this fold is achieved by hydrophobic interactions and by chelation of a single zinc ion by the conserved Cys2-His2 residues (Lee, M. S., Gippert, G. P., Soman, K. V., Case, D. A. and Wright, P. E. (1989) Science 245, 635-637). Nucleic acid recognition is achieved through specific amino acid side chain contacts originating from the xcex1-helix of the domain, which typically binds three base pairs of DNA sequence (Pavletich, N. P. and Pabo, C. O. (1991) Science 252, 809-17, Elrod-Erickson, M., Rould, M. A., Nekludova, L. and Pabo, C. O. (1996) Structure 4, 1171-1180). Unlike other nucleic acid recognition motifs, simple covalent linkage of multiple zinc finger domains allows the recognition of extended asymmetric sequences of DNA. Studies of natural zinc finger proteins have shown that three zinc finger domains can bind 9 bp of contiguous DNA sequence (Pavletich, N. P. and Pabo, C. O. (1991) Science 252, 809-17., Swirnoff, A. H. and Milbrandt, J. (1995) Mol. Cell. Biol. 15, 2275-87). Whereas recognition of 9 bp of sequence is insufficient to specify a unique site within even the small genome of E. coli, polydactyl proteins containing six zinc finger domains can specify 18-bp recognition (Liu, Q., Segal, D. J., Ghiara, J. B. and Barbas III, C. F. (1997) Proc. Natl. Acad. Sci. USA 94, 5525-5530). With respect to the development of a universal system for gene control, an 18-bp address can be sufficient to specify a single site within all known genomes. While polydactyl proteins of this type are unknown in nature, however, their efficacy in gene activation and repression within living human cells has recently been shown (Liu, Q., Segal, D. J., Ghiara, J. B. and Barbas III, C. F. (1997) Proc. Natl. Acad. Sci. USA 94, 5525-5530).
In one aspect, the present invention provides an isolated and purified zinc finger-nucleotide binding polypeptide that contains the amino acid residue sequence of any of SEQ ID NO:1-16. In a related aspect, this invention further provides compositions comprising from two to about 12 such zinc finger-nucleotide binding polypeptides. The composition preferably contains from 2 to about 6 polypeptides. In a preferred embodiment, the zinc finger-nucleotide binding polypeptides are operatively linked, preferably by an amino acid residue linker having the sequence of SEQ ID NO 111. A composition of this invention specifically binds a nucleotide target that contains the sequence 5xe2x80x2-(GNN)n-3xe2x80x2, wherein each N is A, C, G, or T with the proviso that all N""s cannot be C and where n is preferably 2 to 6. A polypeptide or composition can be further operatively linked to one or more transcription modulating factors such as a transcription activators or transcription suppressors or repressors. The present invention also provides an isolated and purified polynucleotide that encodes a polypeptide or composition of this invention and an expression vector containing such a polynucleotide.
In a still further aspect, the present invention provides a process of regulating the function of a nucleotide sequence that contains the sequence 5xe2x80x2-(GNN)n-3xe2x80x2, where n is an integer from 1 to 6, the process comprising exposing the nucleotide sequence to an effective amount of a composition of this invention operatively linked to one or more transcription. modulating factors. The 5xe2x80x2-(GNN)n-3xe2x80x2 sequence can be found in the transcribed region or promotor region of the nucleotide or within an expressed sequence tag. In a preferred embodiment, the nucleotide sequence is part of an oncogene sequence. More preferably, the target nucleotide sequence is contained in a gene that encodes a member of an erbB receptor family. More preferably, the target nucleotide sequence is contained in an erbB gene. Preferred erbB genes are the human erbB-2 and erbB-3 genes.
The present disclosure demonstrates the simplicity and efficacy of a general strategy for the rapid production of gene switches. With a family of defined zinc finger domains recognizing sequences of the 5xe2x80x2-GNN-3xe2x80x2 subset of a 64 member zinc finger alphabet, polydactyl proteins specifically recognizing novel 9- or, for the first time, 18-bp sequences were constructed and characterized. Potent transcription factors were generated and shown to control both gene activation and repression. Gene activation was achieved using the herpes simplex virus VP16 activation domain (Sadowski, I., Ma, J., Triezenberg, S. and Ptashne, M. (1988) Nature 335, 563-564) and a recombinant tetrameric repeat of its minimal activation domain. Gene repression or silencing was achieved using three effector domains of human origin, the krxc3xcppel associated box (KRAB) (Margolin, J. F., Friedman, J. R., Meyer, W., K.-H., Vissing, H., Thiesen, H.-J. and Rauscher III, F. J. (1994) Proc. Natl. Acad. Sci. USA 91, 4509-4513), the ERF repressor domain (ERD) (Sgouras, D. N., Athanasiou, M. A., Beal, G. J., Jr., Fisher, R. J., Blair, D. G. and Mavrothalassitis, G. J. (1995) EMBO J. 14, 4781-4793), and the mSIN3 interaction domain (SID) (Ayer, D. E., Laherty, C. D., Lawrence, Q. A., Armstrong, A. P. and Eisenman, R. N. (1996) Mol. Cell. Biol. 16, 5772-5781). Using luciferase reporter gene assays in human epithelial cells, the data show that artificial transcriptional regulators, designed to target the promoter of the proto-oncogene erbB-21HER-2, can ablate or activate gene expression in a specific manner. For the first time, gene activation or repression was achieved by targeting within the gene transcript, suggesting that information obtained from expressed sequence tags (ESTs) may be sufficient for the construction of gene switches. The novel methodology and materials described herein promise diverse applications in gene therapy, transgenic organisms, functional genomics, and other areas of cell and molecular biology.