Genome engineering requires the ability to insert, delete, substitute and otherwise manipulate specific genetic sequences within a genome, and has numerous therapeutic and biotechnological applications. The development of effective means for genome modification remains a major goal in gene therapy, agrotechnology, and synthetic biology (Porteus et al. (2005), Nat. Biotechnol. 23: 967-73; Tzfira et al. (2005), Trends Biotechnol. 23: 567-9; McDaniel et al. (2005), Curr. Opin. Biotechnol. 16: 476-83). A common method for inserting or modifying a DNA sequence involves introducing a transgenic DNA sequence flanked by sequences homologous to the genomic target and selecting or screening for a successful homologous recombination event. Recombination with the transgenic DNA occurs rarely, but can be stimulated by a double-stranded break in the genomic DNA at the target site. Numerous methods have been employed to create DNA double-stranded breaks, including irradiation and chemical treatments. Although these methods efficiently stimulate recombination, the double-stranded breaks are randomly dispersed in the genome, which can be highly mutagenic and toxic. At present, the inability to target gene modifications to unique sites within a chromosomal background is a major impediment to successful genome engineering.
One approach to achieving this goal is stimulating homologous recombination at a double-stranded break in a target locus using a nuclease with specificity for a sequence that is sufficiently large to be present at only a single site within the genome (see, e.g., Porteus et al. (2005), Nat. Biotechnol. 23: 967-73). The effectiveness of this strategy has been demonstrated in a variety of organisms using chimeric fusions between an engineered zinc finger DNA-binding domain and the non-specific nuclease domain of the FokI restriction enzyme (Porteus (2006), Mol Ther 13: 438-46; Wright et al. (2005), Plant J. 44: 693-705; Urnov et al. (2005), Nature 435: 646-51). Although these artificial zinc finger nucleases stimulate site-specific recombination, they retain residual non-specific cleavage activity resulting from under-regulation of the nuclease domain and frequently cleave at unintended sites (Smith et al. (2000), Nucleic Acids Res. 28: 3361-9). Such unintended cleavage can cause mutations and toxicity in the treated organism (Porteus et al. (2005), Nat. Biotechnol. 23: 967-73).
A group of naturally-occurring nucleases which recognize 15-40 base-pair cleavage sites commonly found in the genomes of plants and fungi may provide a less toxic genome engineering alternative. Such “meganucleases” or “homing endonucleases” are frequently associated with parasitic DNA elements, such as group 1 self-splicing introns and inteins. They naturally promote homologous recombination or gene insertion at specific locations in the host genome by producing a double-stranded break in the chromosome, which recruits the cellular DNA-repair machinery (Stoddard (2006), Q. Rev. Biophys. 38: 49-95). Meganucleases are commonly grouped into four families: the LAGLIDADG (SEQ ID NO: 24) family, the GIY-YIG family, the His-Cys box family and the HNH family. These families are characterized by structural motifs, which affect catalytic activity and recognition sequence. For instance, members of the LAGLIDADG (SEQ ID NO: 24) family are characterized by having either one or two copies of the conserved LAGLIDADG (SEQ ID NO: 24) motif (see Chevalier et al. (2001), Nucleic Acids Res. 29(18): 3757-3774). The LAGLIDADG (SEQ ID NO: 24) meganucleases with a single copy of the LAGLIDADG (SEQ ID NO: 24) motif form homodimers, whereas members with two copies of the LAGLIDADG (SEQ ID NO: 24) motif are found as monomers.
Natural meganucleases, primarily from the LAGLIDADG (SEQ ID NO: 24) family, have been used to effectively promote site-specific genome modification in plants, yeast, Drosophila, mammalian cells and mice, but this approach has been limited to the modification of either homologous genes that conserve the meganuclease recognition sequence (Monnat et al. (1999), Biochem. Biophys. Res. Commun. 255: 88-93) or to pre-engineered genomes into which a recognition sequence has been introduced (Rouet et al. (1994), Mol. Cell. Biol. 14: 8096-106; Chilton et al. (2003), Plant Physiol. 133: 956-65; Puchta et al. (1996), Proc. Natl. Acad. Sci. USA 93: 5055-60; Rong et al. (2002), Genes Dev. 16: 1568-81; Gouble et al. (2006), J. Gene Med. 8(5):616-622).
Systematic implementation of nuclease-stimulated gene modification requires the use of engineered enzymes with customized specificities to target DNA breaks to existing sites in a genome and, therefore, there has been great interest in adapting meganucleases to promote gene modifications at medically or biotechnologically relevant sites (Porteus et al. (2005), Nat. Biotechnol. 23: 967-73; Sussman et al. (2004), J. Mol. Biol. 342: 31-41; Epinat et al. (2003), Nucleic Acids Res. 31: 2952-62).
I-CreI (SEQ ID NO: 1) is a member of the LAGLIDADG (SEQ ID NO: 24) family which recognizes and cleaves a 22 base pair recognition sequence in the chloroplast chromosome, and which presents an attractive target for meganuclease redesign. Genetic selection techniques have been used to modify the wild-type I-CreI recognition site preference (Sussman et al. (2004), J. Mol. Biol. 342: 31-41; Chames et al. (2005), Nucleic Acids Res. 33: e178; Seligman et al. (2002), Nucleic Acids Res. 30: 3870-9, Arnould et al. (2006), J. Mol. Biol. 355: 443-58). More recently, a method of rationally-designing mono-LAGLIDADG (SEQ ID NO: 24) meganucleases was described which is capable of comprehensively redesigning I-CreI and other such meganucleases to target widely-divergent DNA sites, including sites in mammalian, yeast, plant, bacterial, and viral genomes (WO 2007/047859).
The DNA sequences recognized by I-CreI are 22 base pairs in length. One example of a naturally-occurring I-CreI recognition site is provided in SEQ ID NO: 2 and SEQ ID NO: 3, but the enzyme will bind to a variety of related sequences with varying affinity. The enzyme binds DNA as a homodimer in which each monomer makes direct contacts with a nine base pair “half-site” and the two half-sites are separated by four base pairs that are not directly contacted by the enzyme (FIG. 1a). Like all LAGLIDADG (SEQ ID NO: 24) family meganucleases, I-CreI produces a staggered double-strand break at the center of its recognition sequences which results in the production of a four base pair 3′-overhang (FIG. 1a). The present invention concerns the central four base pairs in the I-CreI recognition sequences (i.e. the four base pairs that become the 3′ overhang following I-CreI cleavage, or “center sequence”, FIG. 1b). In the case of the native I-CreI recognition sequence in the Chlamydomonas reinhardtii 23S rRNA gene, this four base pair sequence is 5′-GTGA-3′. In the interest of producing genetically-engineered meganucleases which recognize DNA sequences that deviate from the wild-type I-CreI recognition sequences, it is desirable to know the extent to which the four base pair center sequence can deviate from the wild-type sequences. A number of published studies concerning I-CreI or its derivatives evaluated the enzyme, either wild-type or genetically-engineered, using DNA substrates that employed either the native 5′-GTGA-3′ central sequence or the palindromic sequence 5′-GTAC-3′. Recently, Arnould et. al. (Arnould et al. (2007), J. Mol. Biol. 371: 49-65) reported that a set of genetically-engineered meganucleases derived from I-CreI cleaved DNA substrates with varying efficiencies depending on whether the substrate sequences were centered around 5′-GTAC-3′, 5′-TTGA-3′, 5′-GAAA-3′, or 5′-ACAC-3′ (cleavage efficiency: GTAC>ACAC>>TTGA≈GAAA).