Many, perhaps most, physiological and pathophysiological processes can be controlled by the selective up or down regulation of gene expression. Examples of pathologies that might be controlled by selective regulation include the inappropriate expression of proinflamatory cytokines in rheumatoid arthritis, under-expression of the hepatic LDL receptor in hypercholesterolemia, over-expression of proangiogenic factors and under-expression of antiangiogenic factors in solid tumor growth, to name a few. In addition, pathogenic organisms such as viruses, bacteria, fungi, and protozoa could be controlled by altering gene expression of their host cell. Thus, there is a clear unmet need for therapeutic approaches that are simply able to up-regulate beneficial genes and down-regulate disease causing genes.
In addition, simple methods allowing the selective over- and under-expression of selected genes would be of great utility to the scientific community. Methods that permit the regulation of genes in cell model systems, transgenic animals and transgenic plants would find widespread use in academic laboratories, pharmaceutical companies, genomics companies and in the biotechnology industry.
Gene expression is normally controlled through alterations in the function of sequence specific DNA binding proteins called transcription factors. They act to influence the efficiency of formation or function of a transcription initiation complex at the promoter. Transcription factors can act in a positive fashion (activation) or in a negative fashion (repression).
Transcription factor function can be constitutive (always “on”) or conditional. Conditional function can be imparted on a transcription factor by a variety of means, but the majority of these regulatory mechanisms depend of the sequestering of the factor in the cytoplasm and the inducible release and subsequent nuclear translocation, DNA binding and activation (or repression). Examples of transcription factors that function this way include progesterone receptors, sterol response element binding proteins (SREBPs) and NF-kappa B. There are examples of transcription factors that respond to phosphorylation or small molecule ligands by altering their ability to bind their cognate DNA recognition sequence (Hou et al., Science 256:1701 (1994); Gossen & Bujard, Proc. Nat'l Acad Sci 89:5547 (1992); Oligino et al., Gene Ther. 5:491-496 (1998); Wang et al., Gene Ther. 4:432-441 (1997); Neering et al., Blood 88:1147-1155 (1996); and Rendahl et al., Nat. Biotechnol. 16:757-761 (1998)).
Recombinant transcription factors comprising the DNA binding domains from zinc finger proteins (“ZFPs”) have the ability to regulate gene expression of endogenous genes (see, e.g., U.S. Pat. Nos. 6,534,261; 6,599,692; 6,503,717; 6,689,558; 7,067,317; 7,262,054). Clinical trials using these engineered transcription factors containing zinc finger proteins have shown that these novel transcription factors are capable of treating various conditions. (see, e.g., Yu et al. (2006) FASEB J. 20:479-481).
Another major area of interest in genome biology, especially in light of the determination of the complete nucleotide sequences of a number of genomes, is the targeted alteration of genome sequences. Such targeted cleavage events can be used, for example, to induce targeted mutagenesis, induce targeted deletions of cellular DNA sequences, and facilitate targeted recombination at a predetermined chromosomal locus. See, for example, U.S. Patent Publication Nos. 2003/0232410; 2005/0208489; 2005/0026157; 2005/0064474; 2006/0188987; 2008/015996, and International Publication No. WO 2007/014275, the disclosures of which are incorporated by reference in their entireties for all purposes. See, also, Santiago et al. (2008) Proc Natl Acad Sci USA 105:5809-5814; Perez et al. (2008) Nat Biotechnol 26:808-816 (2008).
Artificial nucleases, which link the cleavage domain of a nuclease to a designed DNA-binding protein (e.g., zinc-finger protein (ZFP) linked to a nuclease cleavage domain such as from FokI), have been used for targeted cleavage in eukaryotic cells. For example, zinc finger nuclease-mediated genome editing has been shown to modify the sequence of the human genome at a specific location by (1) creation of a double-strand break (DSB) in the genome of a living cell specifically at the target site for the desired modification, and by (2) allowing the natural mechanisms of DNA repair to “heal” this break.
To increase specificity, the cleavage event is induced using one or more pairs of custom-designed zinc finger nucleases that dimerize upon binding DNA to form a catalytically active nuclease complex. In addition, specificity has been further increased by using one or more pairs of zinc finger nucleases that include engineered cleavage half-domains that cleave double-stranded DNA only upon formation of a heterodimer. See, e.g., U.S. Patent Publication No. 2008/0131962, incorporated by reference herein in its entirety.
The double-stranded breaks (DSBs) created by artificial nucleases have been used, for example, to induce targeted mutagenesis, induce targeted deletions of cellular DNA sequences, and facilitate targeted recombination at a predetermined chromosomal locus. See, for example, U.S. Patent Publication Nos. 2003/0232410; 2005/0208489; 2005/0026157; 2005/0064474; 2006/0188987; 2006/0063231; 2007/0218528; 2007/0134796; 2008/0015164 and International Publication Nos. WO 07/014275 and WO 2007/139982, the disclosures of which are incorporated by reference in their entireties for all purposes. Thus, the ability to generate a DSB at a target genomic location allows for genomic editing of any genome.
There are two major and distinct pathways to repair DSBs-homologous recombination and non-homologous end-joining (NHEJ). Homologous recombination requires the presence of a homologous sequence as a template (known as a “donor”) to guide the cellular repair process and the results of the repair are error-free and predictable. In the absence of a template (or “donor”) sequence for homologous recombination, the cell typically attempts to repair the DSB via the error-prone process of NHEJ.
The plant pathogenic bacteria of the genus Xanthomonas are known to cause many diseases in important crop plants. Pathogenicity of Xanthomonas depends on a conserved type III secretion (T3 S) system which injects more than 25 different effector proteins into the plant cell. Among these injected proteins are transcription activator-like effectors “TALE” or “TAL-effectors”) which mimic plant transcriptional activators and manipulate the plant transcriptome (see Kay et al (2007) Science 318:648-651). These proteins contain a DNA binding domain and a transcriptional activation domain. One of the most well characterized TALEs is AvrBs3 from Xanthomonas campestris pv. Vesicatoria (see Bonas et al (1989) Mol Gen Genet 218: 127-136 and WO2010079430). TALEs contain a centralized repeat domain that mediates DNA recognition, with each repeat unit containing approximately 33-35 amino acids specifying one target base. TALEs also contain nuclear localization sequences and several acidic transcriptional activation domains (for a review see Schornack S, et al (2006) J Plant Physiol 163(3): 256-272). In addition, in the phytopathogenic bacteria Ralstonia solanacearum two genes, designated brg11 and hpx17 have been found that are homologous to the AvrBs3 family of Xanthomonas in the R. solanacearum biovar 1 strain GMI1000 and in the biovar 4 strain RS1000 (See Heuer et al (2007) Appl and Envir Micro 73(13): 4379-4384). These genes are 98.9% identical in nucleotide sequence to each other but differ by a deletion of 1,575 bp in the repeat domain of hpx17. However, both gene products have less than 40% sequence identity with AvrBs3 family proteins of Xanthomonas. 
DNA-binding specificity of these TALEs depends on the sequences found in the tandem TALE repeat units. The repeated sequence comprises approximately 33-35 amino acids and the repeats are typically 91-100% homologous with each other (Bonas et al, ibid). There appears to be a one-to-one correspondence between the identity of the hypervariable diresidues at positions 12 and 13 with the identity of the contiguous nucleotides in the TALE's target sequence (see Moscou and Bogdanove, (2009) Science 326:1501 and Boch et al (2009) Science 326:1509-1512). These two adjacent amino acids are referred to as the Repeat Variable Diresidue (RVD). Experimentally, the natural code for DNA recognition of these TALEs has been determined such that an HD sequence at positions 12 and 13 leads to a binding to cytosine (C), NG binds to T, NI to A, NN binds to G or A, and NG binds to T. These specificity-determining TALE repeat units have been assembled into proteins with new combinations of the natural TALE repeat units and altered numbers of repeats, to make variant TALE proteins. When in their native architecture, these variants are able to interact with new sequences and activate the expression of a reporter gene in plant cells (Boch et al., ibid.). However, these proteins maintain the native (full-length) TALE protein architecture and only the number and identity of the TALE repeat units within the construct were varied. Entire or nearly entire TALE proteins have also been fused to a nuclease domain from the FokI protein to create a TALE-nuclease fusion protein (“TALEN”), and these TALENs have been shown to cleave an episomal reporter gene in yeast cells. (Christian et al. (2010) Genetics 186(2): 757-61; Li et al. (2011a) Nucleic Acids Res. 39(1):359-372). Such constructs could also modify endogenous genes in yeast cells to quantifiable levels and could modify endogenous genes in mammalian and plant cells to detectable, but unquantifiable levels when appropriate sequence amplification schemes are employed. See, Li et al. (2011b) Nucleic Acids Res. epub doi:10.1093/nar/gkr188; Cermak et al. (2011) Nucleic Acids Res. epub doi:10.1093/nar/gkr218. The fact that a two step enrichment scheme was required to detect activity in plant and animal cells indicates that fusions between nearly entire TALE proteins and the nuclease domain from the FokI protein do not efficiently modify endogenous genes in plant and animal cells. In other words, the peptide used in these studies to link the TALE repeat array to the FokI cleavage domain does not allow efficient cleavage by the FokI domain of endogenous genes in higher eukaryotes. These studies therefore highlight the need to develop compositions that can be used connect a TALE array with a nuclease domain that would allow for highly active cleavage in endogenous eukaryotic settings.
There remains a need for engineered DNA binding domains to increase the scope, specificity and usefulness of these binding proteins for a variety of applications including engineered transcription factors for regulation of endogenous genes in a variety of cell types and engineered nucleases that can be similarly used in numerous models, diagnostic and therapeutic systems, and all manner of genome engineering and editing applications.