Mammalian genomes constantly suffer from various types of damage, of which double-strand breaks (DSBs) are considered the most dangerous (Haber 2000). Repair of DSBs can occur through diverse mechanisms that can depend on cellular context. Repair via homologous recombination (HR) is able to restore the original sequence at the break. Because of its strict dependence on extensive sequence homology, this mechanism is suggested to be active mainly during the S and G2 phases of the cell cycle where the sister chromatids are in close proximity (Sonoda, Hochegger et al. 2006). Single-strand annealing (SSA) is another homology-dependent process that can repair DSBs between direct repeats and thereby promotes deletions (Paques and Haber 1999). Finally, non-homologous end joining (NHEJ) of DNA is a major pathway for the repair of DSBs that can function throughout the cell cycle and does not depend on homologous recombination (Moore and Haber 1996; Haber 2008). NHEJ seems to comprise at least two different components: (i) a pathway that consists mostly in the direct re-joining of DSB ends, and which depends on the XRCC4, Lig4 and Ku proteins, and; (ii) an alternative NHEJ pathway, which does not depend on XRCC4, Lig4 and Ku, and is especially error-prone, resulting mostly in deletions, with the junctions occurring between micro-homologies (Frank, Sekiguchi et al. 1998; Gao, Sun et al. 1998; Guirouilh-Barbat, Huck et al. 2004; Guirouilh-Barbat, Rass et al. 2007; Haber 2008; McVey and Lee 2008).
Homologous gene targeting (HGT), first described over 25 years ago (Hinnen, Hicks et al. 1978; Orr-Weaver, Szostak et al. 1981; Orr-Weaver, Szostak et al. 1983; Rothstein 1983), was one of the first methods for rational genome engineering and remains to this day a standard for the generation of engineered cells or knock-out mice (Capecchi 2001). An inherently low efficiency has nevertheless prevented it from being used as a routine protocol in most cell types and organisms. To address these issues, an extensive assortment of rational approaches has been proposed with the intent of achieving greater than 1% targeted modifications. Many groups have focused on enhancing the efficacy of HGT, with two major disciplines having become apparent: (i) so-called “matrix optimization” methods, essentially consisting of modifying the targeting vector structure to achieve maximal efficacy, and; (ii) methods involving additional effectors to stimulate HR, generally sequence-specific endonucleases. The field of matrix optimization has covered a wide range of techniques, with varying degrees of success (Russell and Hirata 1998; Inoue, Dong et al. 2001; Hirata, Chamberlain et al. 2002; Taubes 2002; Gruenert, Bruscia et al. 2003; Sangiuolo, Scaldaferri et al. 2008; Bedayat, Abdolmohamadi et al. 2010). Stimulation of HR via nucleases, on the other hand, has repeatedly proven efficient (Paques and Duchateau 2007; Carroll 2008).
For DSBs induced by biological reagents, e.g. meganucleases, ZFNs and TALENs (see below), which cleave DNA by hydrolysis of two phosphodiester bonds, the DNA can be rejoined in a seamless manner by simple re-ligation of the cohesive ends. Alternatively, deleterious insertions or deletions (indels) of various sizes can occur at the breaks, eventually resulting in gene inactivation (Liang, Han et al. 1998; Lloyd, Plaisier et al. 2005; Doyon, McCammon et al. 2008; Perez, Wang et al. 2008; Santiago, Chan et al. 2008; Kim, Lee et al. 2009; Yang, Djukanovic et al. 2009). The nature of this process, which does not rely on site-specific or homologous recombination, gives rise to a third targeted approach based on endonuclease-induced mutagenesis. This approach, as well as the related applications, may be simpler than those based on homologous recombination in that (a) one does not need to introduce a repair matrix, and; (b) efficacy will be less cell-type dependant (in contrast to HR, NHEJ is probably active throughout the cell cycle (Delacote and Lopez 2008). Targeted mutagenesis based on NEHJ has been used to trigger inactivation of single or even multiple genes in immortalized cell lines (Cost, Freyvert et al. 2010; Liu, Chan et al. 2010). In addition, this method opens new perspectives for organisms in which the classical HR-based gene knock-out methods have proven inefficient, or at least difficult to establish (Doyon, McCammon et al. 2008; Geurts, Cost et al. 2009; Shukla, Doyon et al. 2009; Yang, Djukanovic et al. 2009; Gao, Smith et al. 2010; Mashimo, Takizawa et al. 2010; Menoret, Iscache et al. 2010).
Over the last 15 years, the use of meganucleases to successfully induce gene targeting has been well documented, starting from straightforward experiments involving wild-type I-SceI to more refined work involving completely re-engineered enzymes (Stoddard, Scharenberg et al. 2007; Galetto, Duchateau et al. 2009; Marcaida, Munoz et al. 2010; Arnould, Delenda et al. 2011). Meganucleases, also called homing endonucleases (HEs), can be divided into five families based on sequence and structure motifs: LAGLIDADG, GIY-YIG, HNH, His-Cys box and PD-(D/E)XK (Stoddard 2005; Zhao, Bonocora et al. 2007). Structural data are available for at least one member of each family. The most well studied family is that of the LAGLIDADG proteins, with a considerable body of biochemical, genetic and structural work having established that these endonucleases could be used as molecular tools (Stoddard, Scharenberg et al. 2007; Arnould, Delenda et al. 2011). Member proteins are composed of domains that adopt a similar αββαββα fold, with the LAGLIDADG motif comprising the terminal region of the first helix and not only contributing to a bipartite catalytic center but also forming the core subunit/subunit interaction (Stoddard 2005). Two such α/β domains assemble to form the functional protein, with the β-strands in each creating a saddle-shaped DNA binding region. The spatial separation of the catalytic center with regions directly interacting with the DNA has allowed for specificity re-engineering (Seligman, Chisholm et al. 2002; Sussman, Chadsey et al. 2004; Arnould, Chames et al. 2006; Doyon, Pattanayak et al. 2006; Rosen, Morrison et al. 2006; Smith, Grizot et al. 2006; Arnould, Perez et al. 2007). In addition, whereas all known LAGLIDADG proteins analyzed to date act as “cleavases” to cut both strands of the target DNA, recent progress has been made in generating “mega-nickases” that cleave only one strand (Niu, Tenney et al. 2008; McConnell Smith, Takeuchi et al. 2009). Such enzymes can in principle provide similar levels of targeted induced HR with a minimization in the frequency of NHEJ.
Although numerous engineering efforts have focused on LAGLIDADG HEs, members from two other families, GIY-YIG and HNH, are of particular interest. Biochemical and structural studies have established that in both families, member proteins can adopt a bipartite fold with distinct functional domains: (1) a catalytic domain responsible mainly for DNA cleavage, and; (2) a DNA-binding domain to provide target specificity (Stoddard 2005; Marcaida, Munoz et al. 2010). The related GIY-YIG HEs I-TevI and I-BmoI have been exploited to demonstrate the interchangeability of the DNA-binding region for these enzymes (Liu, Derbyshire et al. 2006). Analysis of the I-BasI HE revealed that although the N-terminal catalytic domain belongs to the HNH family, the C-terminal DNA-binding region resembles the intron-encoded endonuclease repeat motif (IENR1) found in endonucleases of the GIY-YIG family (Landthaler and Shub 2003). The catalytic head of I-BasI has sequence similarity to those of the HNH HEs I-HmuI, I-HmuII and I-TwoI, all of which function as strand-specific nickases (Landthaler, Begley et al. 2002; Landthaler and Shub 2003; Landthaler, Lau et al. 2004; Shen, Landthaler et al. 2004; Landthaler, Shen et al. 2006).
Whereas the above families of proteins contain sequence-specific nucleases, the HNH motif has also been identified in nonspecific nucleases such the E. coli colicins (e.g. ColE9 and ColE7), EndA from S. pneumoniae, NucA from Anabaena and CAD (Midon, Schafer et al. 2011). As well as having the HNH motif, several of these nucleases contain the signature DRGH motif and share structural homology with core elements forming the ββα-Me-finger active site motif. Mutational studies of residues in the HNH/DRGH motifs have confirmed their role in nucleic acid cleavage activity (Ku, Liu et al. 2002; Doudeva, Huang et al. 2006; Eastberg, Eklund et al. 2007; Huang and Yuan 2007). Furthermore, the DNA binding affinity and sequence preference for ColE7 could be effectively altered (Wang, Wright et al. 2009). Such detailed studies illustrate the potential in re-engineering nonspecific nucleases for targeted purposes.
Zinc-finger nucleases (ZFNs), generated by fusing Zinc-finger-based DNA-binding domains to an independent catalytic domain via a flexible linker (Kim, Cha et al. 1996; Smith, Berg et al. 1999; Smith, Bibikova et al. 2000), represent another type of engineered nuclease commonly used to stimulate gene targeting. The archetypal ZFNs are based on the catalytic domain of the Type IIS restriction enzyme FokI and have been successfully used to induce gene correction, gene insertion, and gene deletion. Zinc Finger-based DNA binding domains are made of strings of 3 or 4 individual Zinc Fingers, each recognizing a DNA triplet (Pabo, Peisach et al. 2001). In theory, one of the major advantages of ZFNs is that they are easy to design, using combinatorial assembly of preexisting Zinc Fingers with known recognition patterns (Choo and Klug 1994; Choo and Klug 1994; Kim, Lee et al. 2009). However, close examination of high resolution structures shows that there are actually cross-talks between units (Elrod-Erickson, Rould et al. 1996), and several methods have been used to assemble ZF proteins by choosing individual Zinc Fingers in a context dependant manner (Greisman and Pabo 1997; Isalan and Choo 2001; Maeder, Thibodeau-Beganny et al. 2008; Ramirez, Foley et al. 2008) to achieve better success rates and reagents of better quality.
Recently, a new class of chimeric nuclease using a FokI catalytic domain has been described (Christian, Cermak et al. 2010; Li, Huang et al. 2011). The DNA binding domain of these nucleases is derived from Transcription Activator Like Effectors (TALE), a family of proteins used in the infection process by plant pathogens of the Xanthomonas genus. In these DNA binding domains, sequence specificity is driven by a series of 33-35 amino acids repeats, differing essentially by two positions (Boch, Scholze et al. 2009; Moscou and Bogdanove 2009). Each base pair in the DNA target is contacted by a single repeat, with the specificity resulting from the two variant amino acids of the repeat (the so-called repeat variable dipeptide, RVD). The apparent modularity of these DNA binding domains has been confirmed to a certain extent by modular assembly of designed TALE-derived protein with new specificities (Boch, Scholze et al. 2009; Moscou and Bogdanove 2009). However, one cannot yet rule out a certain level of context dependence of individual repeat/base recognition patterns, as was observed for Zinc Finger proteins (see above). Furthermore, it has been shown that natural TAL effectors can dimerize (Gurlebeck, Szurek et al. 2005) and how this would affect a “dimerization-based” TALE-derived nuclease is currently unknown.
The functional layout of a FokI-based TALE-nuclease (TALEN) is essentially that of a ZFN, with the Zinc-finger DNA binding domain being replaced by the TALE domain (Christian, Cermak et al. 2010; Li, Huang et al. 2011). As such, DNA cleavage by a TALEN requires two DNA recognition regions flanking an unspecific central region. This central “spacer” DNA region is essential to promote catalysis by the dimerizing FokI catalytic domain, and extensive effort has been placed into optimizing the distance between the DNA binding sites (Christian, Cermak et al. 2010; Miller, Tan et al. 2011). The length of the spacer has been varied from 14 to 30 base pairs, with efficiency in DNA cleavage being interdependent with spacer length as well as TALE scaffold construction (i.e. the nature of the fusion construct used). It is still unknown whether differences in the repeat region (i.e. RVD type and number used) have an impact on the DNA “spacer” requirements or on the efficiency of DNA cleavage by TALENs. Nevertheless, TALE-nucleases have been shown to be active to various extents in cell-based assays in yeast, mammalian cells and plants (Christian, Cermak et al. 2010; Li, Huang et al. 2011; Mahfouz, Li et al. 2011; Miller, Tan et al. 2011).
The inventors have developed a new type of TALEN that can be engineered to specifically recognize and process target DNA efficiently. These novel “compact TALENs” (cTALENs) do not require dimerization for DNA processing activity, thereby alleviating the need for “dual” target sites with intervening DNA “spacers”. Furthermore, the invention allows for generating several distinct types of enzymes that can enhance separate DNA repair pathways (HR vs. NHEJ).