Nature's use of the protein α-helix for specific DNA recognition is ubiquitous and maximally utilized by the basic region/leucine zipper motif (bZIP), which comprises a pair of short α-helices that recognize the DNA major groove with sequence-specificity and high affinity (Struhl, K., Trends Biochem. Sci. 1989, 14, 137-140; Landschulz, W. H. et al., Science 1988, 240, 1759-1764). Crystal structures of the bZIP domain of GCN4 bound to two different DNA sites (König, P. and Richmond, T. J., J. Mol. Biol. 1993, 233, 139-154; Ellenberger, T. E. et al., Cell 1992, 71, 1223-1237; Keller, W. et al., J. Mol. Biol. 1995, 254, 657-667) and the Jun-Fos heterodimer bZIP-DNA crystal (Glover, J. N. M. and Harrison, S. C., Nature 1995, 373, 257-261) show that a continuous α-helix of ˜60 amino acids provides the basic-region interface for binding to specific DNA sites, as well as the leucine zipper coiled-coil dimerization structure. Remarkably, these crystal structures also demonstrate astonishing conservation of protein backbone structure across species between the two yeast GCN4 and avian Jun-Fos structures.
Myc, Max, and Mad Proteins
The basic-region/helix-loop-helix (bHLH) motif, including the subvariant basic-region/helix-loop-helix/leucine-zipper (bHLHZ) motif, is very similar to the bZIP in that a dimer of α-helices binds specific sites in the DNA major groove; protein dimerization is effected by the helix-loop-helix, a tetramer of α-helices in the bHLH, or by the helix-loop-helix/leucine-zipper in the bHLHZ, in which dimerization is mediated by both the tetrameric HLH and adjacent leucine zipper (compare structures in FIG. 1) (Murre, C. Cell, 1989, 56, 777-783). The bHLH comprises bHLH proteins as well as subfamily variants: the bHLHZ (such as Max and USF), and the bHLH/PAS (such as AhR and Arnt), where the PAS domain assists in efficient protein dimerization. Unlike the leucine zipper, the PAS structure is unknown. The PAS has been found in the Per, Arnt, and Sim proteins—hence, “PAS”—as well as AhR and HIF-1α (Gradin, K., et al., Mol. Cell. Biol. 1996, 16, 5221-5231). The PAS domain comprises 200-300 amino acids and contains characteristic repeats termed the “A” and “B” domains.
Like bZIP proteins, the bHLH protein family also regulates transcription. In particular, the Myc, Max, and Mad transcription factor network comprises widely expressed bHLHZ proteins critical for control of normal cell proliferation and differentiation (Amati, B. and Land, H., Curr. Opin. Gene. Dev. 1994, 4, 102-108; Orian, A. et al., Genes Dev., 2003, 17, 1101-1114). Myc is proto-oncogenic; deregulated overexpression of myc genes leads to malignant transformation, and myc genes are suspected of being among the most frequently affected in human tumors and disease (Nesbit, C. D. et al., Oncogene 1999, 18, 3004-3016) including Burkitt's lymphoma (Taub, R. et al., Proc. Natl. Acad. Sci. USA 1982, 79, 7837-7841; Dalla-Favera, R. et al., M., Proc. Natl. Acad. Sci. USA 1982, 79, 7824-7827), neuroblastomas (Schwab, M. et al., Nature 1984, 308, 288-291), and small cell lung cancers (Nau, M. M. et al., Nature 1985, 318, 69-73).
In contrast, Max is a stable, constitutively expressed dimerization partner that heterodimerizes with Myc, Mad, and Mxi, thereby controlling their DNA-binding and gene-regulatory activities (Amati, B. and Land, H., Curr. Opin. Gene. Dev. 1994, 4, 102-108; Orian, A. et al., Genes Dev., 2003, 17, 1101-1114). Myc-Max is a transcriptional activator that binds the Enhancer box (E-box) sequence 5′-CACGTG (Blackwood, E. M. et al., Science 1991, 251, 1211-1217; Blackwell, T. K. et al., Mol. Cell. Biol. 1993, 13, 5216-5224). Myc does not homodimerize in vivo or at physiological concentrations, so its activity is mediated by heterodimerization with Max. In contrast Max can homodimerize, although it preferentially heterodimerizes; Max homodimers can bind the E-box, albeit with lower affinities than that of the heterodimers (Blackwood, E. M. et al., Science 1991, 251, 1211-1217). Several promoters contain the E-box sequence 5′-CACGTG, including that for p53 tumor suppressor (Reisman, D. et al., Cell Growth Differ. 1993, 4, 57-65). Mad-Max (Amati, B. et al., Cell 1993, 72, 233-245) and the related Mxi-Max (Zervos, A. S. et al., Cell 1993, 72, 223-232) are transcriptional repressors that antagonize Myc-Max by competing for the same E-box sequence.
The Max network is highly conserved in vertebrates and mammals and ubiquitous; in Drosophila, for instance, a conservative estimate is that Max network proteins interact with approximately 2000 genes (Orian, A. et al., Genes Dev., 2003, 17, 1101-1114). The transactivation domain mediating the gene-regulatory activities of the Myc-Max heterodimer lies in the amino-terminal region of Myc; Max's role is to allow Myc to bind DNA, thereby mediating its cellular activities (Amati, B. and Land, H., Curr. Opin. Gene. Dev. 1994, 4, 102-108). Therefore, mutant proteins that interfere with Myc-Max recognition of the E-box site may also interfere with Myc's disease-promoting activities.
AhR and Arnt Proteins
Not only interesting from a protein-design perspective, the AhR-Arnt system is notable for its possible role in disease pathways. The AhR, also known as the dioxin receptor, mediates signal transduction (Fisher, J. M., et al., Mol. Carcinogen. 1989, 1, 216-221) by dioxins and related polycyclic aromatic hydrocarbons, including benzo[a]yrenes found in cigarette smoke and smog, heterocyclic amines in cooked meat, and polychlorinated biphenyls (PCBs). In analogy to the glucocorticoid receptor, the latent AhR is found associated with heat-shock protein hsp90 in the cytosol (Cadepond, F. et al., J. Biol. Chem. 1991, 266, 5834-5841.). Ligand binding induces nuclear translocation of the AhR (Pollenz, R. S. et al., Mol. Pharmacol. 1995, 45, 428-438), release of hsp90, and dimerization with the nuclear protein Arnt (Reyes, H. et al., Science 1992, 256, 1193-1195); this activated complex (Whitelaw, M. et al., Mol. Cell. Biol. 1993, 13, 2504-2514; Cuthill, S., et al., Mol. Cell. Biol. 1991, 11, 401-411) then binds specific xenobiotic response elements (XRE sites) and activates gene transcription (Wu, L. and Whitlock, J. P. Nucl. Acid. Res. 1993, 21, 119-125; Fujisawa-Sehara, A. et al., Nucl. Acid. Res. 1987, 15, 4179-4191). The endogenous ligand, if any, for the dioxin receptor has yet to be discovered. During evolution, plant flavones and later, certain combustion products like dioxin, appear to have appropriated the AhR for stimulating their own metabolism.
AhR and Arnt are bHLH/PAS proteins; they differ from most other bHLH transcription factors in that AhR-Arnt dimerization occurs only in the presence of ligand. The PAS domain is remote from the basic region, and importantly, it does not affect DNA binding, as it is purely necessary for dimerization and ligand binding; Poellinger and coworkers found that the minimal bHLH domains of AhR and Arnt are solely capable of recognition of XRE sites and dimerization (Pongratz, I., et al., Mol. Cell. Biol. 1998, 18, 4079-4088).
Previous work has shown that within the bZIP family, basic regions and leucine zippers from different proteins can be exchanged with no resulting change in α-helical structure or DNA-binding function (Agre, P. et al., Science 1989, 246, 922-926; Lajmi, A. R. et al., J. Am. Chem. Soc. 2000, 122, 5638-5639; Sellers, J. W. et al., Nature 1989, 341, 74-76). Likewise, the bHLH/bHLHZ is well conserved structurally and essentially identical among bHLH/bHLHZ family members (Nair, S. K. and Burley, S. K., Cell 2003, 112, 193-205). Protein-DNA crystal structures for bHLH proteins MyoD (Ma, P. C et al., Cell 1994, 77, 451-459) and E47 (Ellenberger, T. et al., Genes Dev. 1994, 8, 970-980), and bHLHZ proteins Max (Ferre-D'Amare, A. R. et al., Nature 1993, 363, 38-45; Brownlie, P. et al., Structure 1997, 5, 509-520) and USF (Ferre-D'Amare, A. R. et al., EMBO J. 1994, 13, 180-189) show closely related structures and DNA-binding functions. Exchange of basic regions and dimerization elements in the bHLHZ family also yields native-like proteins: Prochownik and coworkers showed that the Max basic region could be fused to the USF HLHZ domain to generate hybrids that could homodimerize and bind the E-box (Yin, X. et al., Oncogene 1998, 16, 2629-2637).
The crystal structures of bZIP and bHLH demonstrate that although they are distinct protein structural families, they share the most similarity in comparison to other families of DNA-binding proteins: in particular, the α-helix DNA recognition element is highly conserved in the two families (König, P. and Richmond, T. J., J. Mol. Biol. 1993, 233, 139-154; Ellenberger, T. E. et al., Cell 1992, 71, 1223-1237; Keller, W. et al., J. Mol. Biol. 1995, 254, 657-667; Glover, J. N. M. and Harrison, S. C., Nature 1995, 373, 257-26; Ma, P. C et al., Cell 1994, 77, 451-459; Ellenberger, T. et al., Genes Dev. 1994, 8, 970-980; Ferre-D'Amare, A. R. et al., Nature 1993, 363, 38-45; Brownlie, P. et al., Structure 1997, 5, 509-520; Ferre-D'Amare, A. R. et al., EMBO J. 1994, 13, 180-189). In contrast, there are differences in the hinge angles which govern positioning of the basic regions in the major grooves between bZIP and bHLH. Additionally, the dimerization element in the bHLH is more complicated than the smaller, simpler leucine zipper.
No simple code exists for protein-DNA recognition, and this fact has made design of sequence-specific DNA-binding proteins a major challenge.