Transcription Activator Effector-like proteins (TALEs) are proteins that are encoded by phytopathogenic bacteria of the genus Xanthomonas and Ralstonia to influence the gene expression of host plant cells during bacterial infection. These proteins comprise a DNA binding region and an N-terminal domain that appears to interact with the bacterial transport machinery for introducing the protein into the plant cell. The C-terminal domain of the TALE protein seems to interact with the plant host's transcriptional machinery to induce expression of sets of plant genes that are beneficial to the invading bacteria. The DNA binding portion of the proteins is found in the middle section of the protein and is made of an array of repeat units, each approximately 33-35 amino acids in length, which have been shown to be responsible for interacting with the target DNA.
TALE proteins have been under investigation for several years. The bacteria that harbor such proteins are important pathogens for many important crop species and thus the scientific field has sought to understand the mechanisms these bacteria utilize during a successful plant infection. See, e.g., Zhu et al (1998) MPMI 11(8):824-832), Yang et al (2000) J. Biol. Chem. 275(27):20734-41; Boch et al (see Science, (2009) 326 p. 1509) and Moscou and Bogdanove (Science, (2009) 326, p. 1501)
TALE proteins have now been utilized to make fusion proteins with a nuclease catalytic domain to allow engineering of target specific nucleases (termed TALE-nucleases or TALENs). Activity of the proteins within the fusion has been increased by truncation of the C-terminal domain of the TALE (see co-owned U.S. Patent Publication 20110301073 as well as Miller et al. (2010) Nature Biotechnology 29(8):731-734 and WO2010079430). Additionally, the TALE DNA binding domains have been fused to transcription activation and repression domains, and these TALE transcription factors (TALE TFs) have been demonstrated to be capable of regulating the expression of an endogenous target gene. Thus, since the DNA binding domains of these proteins can be engineered to recognize a specific sequence and can be fused to a nuclease domain or transcriptional domain, these engineered proteins hold a great deal of interest and promise for genome editing.
A major area of interest in genome biology, especially in light of the determination of the complete nucleotide sequences of a number of genomes, is the targeted alteration of genome sequences by genome editing. Such targeted cleavage events can be used, for example, to induce targeted mutagenesis, induce targeted deletions of cellular DNA sequences, and facilitate targeted recombination at a predetermined chromosomal locus. See, for example, United States Patent Publications 20030232410; 20050208489; 20050026157; 20050064474; 20060188987; 2008015996, and International Publication WO 2007/014275, the disclosures of which are incorporated by reference in their entireties for all purposes. See, also, Santiago et al. (2008) Prot Natl Acad Sci USA 105:5809-5814; Perez et al. (2008) Nat Biotechnol 26:808-816 (2008).
There remains a need for engineered DNA binding domains comprising TALEs with increased activity and/or specificity. Enhancements in activity and/or specificity of these proteins will increase their scope and usefulness for a variety of applications including engineered transcription factors for regulation of endogenous genes in a variety of cell types, and engineered nucleases that can be similarly used in numerous models, diagnostic and therapeutic systems, and all manner of genome engineering and editing applications.