Homing.
‘Homing’ is a widespread process involving the transfer of an intervening sequence (e.g., introns (e.g., group I or group II introns) or inteins) to a homologous allele that lacks the sequence, leading to gene conversion and dominant transmission and inheritance of the mobile element. Intervening sequences capable of homing are found in all brances of life (e.g., phage, Eubacteria, Archaea, and eukaryotes), and within eukaryotes for example are found within nuclear, mitochondrial and chloroplast genomes. Homing is initiated by an endonuclease (homing endonuclease; HE), encoded within the intervening sequence or intein, which recognizes a DNA target site and generates a single- or double-strand break. HEs are normally expressed in the cytosol and targeted to DNA-containing organelles posttranslationally.
Group I and group II introns are distinguished based on their respective transfer mechanisms. Transfer of group I introns is completed by cellular mechanisms that repair the stand breaks via homologous recombination. Homing of group II introns involves a more complex process comprising strand cleavage, reverse splicing to generate a DNA-RNA hybrid intermediate, and reverse transcription using the inserted RNA as template, where the sequential activities are encoded by within the intron on a single multifunctional polypeptide chain. The homing mechanism of inteins is similar to that of group I introns, but the system comprises functional fusion (in-frame) of the endonuclease with the intein host to provide a polypeptide chain harboring activities of the homing endonuclease, the intein peptide ligase and the host protein, and wherein the portions of the intein's surface participate in DNA recognition and binding by the endonuclease. In all cases, the homing endoculease gene is duplicated into the target site (e.g., non-disruptive sites such as introns and inteins, etc.).
Homing Endonucleases and Classes Thereof.
Homing endonucleases are highly specific DNA cutting enzymes and recognize DNA target sites ranging from about 14 to about 40 base pairs. While being highly specific to promote precise transfer of introns or inteins and avoid genomic toxity, the homing endoculeases must retain sufficient site recognition flexibility (sufficient infidelity) to permit lateral transfer in the face of sequence variation in diverging targets and host. There are five known families of homing endonucleases (LAGLIDAG, HNH, His-Cys Box, GIY-YIG and I-SspI-type) that differ in their conserved nuclease active-site core motifs and catalytic mechanism.
LAGILDADG motif (SEQ ID NO:25) homing endonucleases (LHE) are the largest family of homing endonucleases, and are typically encoded within introns (as free-standing enzymes) or inteins (as in-frame fusion proteins) of mintochondrial or chloroplast genomes in single-cell eukaryotes (e.g., yeast). LHE homing endonucleases were first defined in the early 1990s with the discovery that the “homing” property of a mobile intron to intron-less alleles of S. Cerevesiae involved the induction of a specific double-strand break in intronless alleles of the gene, the break being generated by a nuclease protein encoded by the mobile intron. The created double-strand break catalyzes homologous recombination between the intron containing and non-containing alleles, resulting in the copying of the intron into the intron-less allele. The intron-encoded protein, I-SceI, and related proteins, were subsequently designated as “homing” endonucleases. Because of a recognizable motif present in two central alpha helixes of I-SceI, this homing endonuclease family, including I-SceI, became known as the LAGLIDADG motif (SEQ ID NO:25) homing endonuclease (LHE) family. LHE proteins are formed as homodimers or pseudosymmetric monomers that generally recognize DNA sequences 18-24 base-pairs in length (Chevalier & Stoddard, Nucleic Acids Res, 29:3757-3774, 2001). Homodimers recognize consensus DNA targets that are constrained to paladromic or near palindromic symmetry, whereas monomeric enzymes having two copies of the consensus LAGLIDADG motif (SEQ ID NO:25) possesses a pair of structurally similar nuclease domains on a single polypeptide chain, and are not constrained to symmetric DNA targets. Generally, the molecular structures are built around two conserved alpha-helices that contain the LAGLIDADG motif (SEQ ID NO:25), and which forms the center of the interface between enzyme subunits or domains as the case may be (Heath, et al., Nat Struct Biol, 4:468-476, 1997). The final acidic residues from the central alpha helix helices form part of each domain's active site that cleaves one strand of the double-stranded DNA target sequence. The DNA binding interface of each domain is made up of a four-stranded antiparallel beta-sheet that is supported by a series of framework alpha-helices which form the core of the domain. Unlike art-recognized ‘restriction endonucleases,’ which form densely packed and almost completely saturated DNA-protein interfaces, the DNA binding interface of LHEs make fewer hydrogen bonds per target sequence base pair (Galburt & Stoddard, Biochemistry, 41:13851-13860, 2002). These structural properties account for the ability of LHEs to withstand moderate variability in target sequence recognition (e.g., see Jurica, et al., Mol Cell, 2:469-476, 1998; Chevalier, et al., J Mol Biol, 329:253-26, 2003; Moure, et al., J Mol Biol, 334:685-695, 2003; and Moure, et al., Nat Struct Biol, 9:764-77, 2002), a characteristic that has been essential in maintaining their genetic mobility and horizontal proliferation (Burt & Koufopanou, Curr Opin Genet Dev, 14:609-615, 2004) and which make LHEs ideal substrates for engineering altered DNA binding interfaces with novel endonucleolytic specificities (Duan, et al., 89:555-56, 1997; Chevalier, et al., Mol Cell, 10:895-905, 2002; Epinat, et al., Nucleic Acids Res, 31:2952-2962, 2003; Arnould, et al., J Mol Biol, 355:443-458, 2006; and Steuer, et al., Chembiochem, 5:206-213, 2004). The combination of high target sequence specificity and adaptable DNA binding interfaces make LHEs attractive tools for genome engineering applications which require the introduction of a double-stranded break at a precise genomic location (Steuer, et al., Chembiochem, 5:206-213, 2004; Storici, F., Durham, et al., Proc Natl Acad Sci USA, 100:14994-14999, 2003; Tzfira, et al., Plant Physiol, 133:1011-1023, 2003; and Miller, et al., Mol Cell Biol, 23:3550-3557, 2003). DNA binding by intein-associated LHEs (e.g., PI-Scep involves recruitment of adjacent protein domains (adjacent intein domains). For example, the PI-SceI endonuclease intein combination binds a 31 bp site, and the majority of the energetic contribution to binding is derived from interactions with the intein peptide splicing domain; the endonuclease domain contains the active sites, but exhibits relatively weak, non-specific DNA binding.
Despite little primary sequence homology among the LHEs outside of the LAGLIDADG motif (SEQ ID NO:25) itself, the topologies among the endonuclease domains and the shape of their DNA-bound β-sheets, are remarkably similar, and the structure of the central core of β-sheets is well conserved. These positions correspond to residues that make contacts to base pairs in each DNA half-site. Alignments of intein-associated endonuclease domains indicate a somewhat more diverged structure of the β-sheet motifs. In particular instances, the core fold of LHE enzymes can be tethered to additional functional domains (e.g., NUMODS; nuclear associated modular DNA binding domains) involved in DNA binding.
Like most nucleases, LHEs require divalent cations for activity. Two metals (calcium and copper) fail to support cleavage, two (nickel and zinc) display reduced cleavage, and three (magnesium, cobalt and manganese) display full activity under all tested conditions. The use of manganese in place of magnesium allows recognition and cleavage of a broader repertoire of DNA target sequences than observed with magnesium.
The HNH and His-Cys box homing endonucelases appear to be derived from a common ancestor built around a consensus nuclease active site architecture known as a ‘ββα-metal’ motif. The HNH homing endonuclease family if generally found in page introns, and possess a long monmeric extended, modular monomeric structure, in which the relatively non-specific nuclease domain at the N-terminus is tethered to additional structural motifs that confer and restrict DNA binding specificity. Prototypical members (e.g., I-Hmul) recognize asymmetric DNA sites of about 24 bp or longer. In contrast, the His-Cys box homing endonucelases are generally encoded in nucleolar introns within rDNA host genes, have compact homodimeric structures, recognize shoiter symmetric DNA target sites with higher overall homing in a manner similar to the LHE systems.
The GIY-YIG motif (SEQ ID NO:26) endonuclease family members are also encoded within phage introns and possess modular structures similar to the HNH endocleases. The GIY-YIG motif (SEQ ID NO:26) endonuclease catalytic domain is quite non-specific in its inherent cleavage activity, again (as for the HNH family) being restricted to target sites that are dictated by the appended DNA-binding modules.
The fifth family, represented by the prototypical enzyme I-SspI found in Synechocystis, is responsible for the presence and persistence of introns in cyanobacterial tRNA genes. I-SspI displays limited homology to known nuclease superfamilies, and is currently represented by only a limited number of indentified open reading frames.
Molecular Biology and Genome Engineering Applications.
Because of their relatively long recognition sequences, homing endonucleases (e.g., LHEs) induce a very low frequency of cleavage, even in large vertebrate genomes, and homing endonucleases are therefore regarded as having possible utility as rare-cutter endonucleases for use in molecular biology and genome engineering applications, particularly those applications which mimic their well known natural function of catalyzing homologous recombination via induction of a DNA double strand break, such as those related to targeted recombination, gene repair and gene conversion.
Engineering and Directed Evolution of Alternative Systems.
Some efforts have been directed to tethering non-specific nuclease domains to sequence-specific DNA binding modules such as zinc fingers (resulting in so called zinc finger nucleases, or ZFNs) for in vivo use in stimulating homologous recombination (Bilikova et al., 2001, 2003) and to drive sequence correction of a disease-causing allele associated with a severe genetic disorder (Urnow et al., 2005). However, despite the ease of designing such highly specific ZFN reagents, comparison of their properties to those of homing endonucleases indicates that both are worthy of development. For example, the nuclease domains of ZFN constructs appear to display significant non-specific DNA nicking and cleaving activity in the engineered chimeras, and these constructs can generate multiple adjacent phosphate cleavage events within a single bound DNA target site, which may enhance non-conservative break repair outcomes. By contrast, LHE cleavage is tightly coupled to cognate site binding, and the enzyme action, by virtue of tight product binding properties, appears to strongly enhance the ratio of homologous recombination relative to undesirable, non-conservative double-strand break repair events such as non-homologous end-joining. Additionally, ZFN chimeras have the disadvantage that they require expression of two separate chains to generate double-strand breaks, and more total coding sequence to generate the active enzyme. Efforts have been made to increase or alter the specificity of type II restriction endonucleases, but have been generally unsuccessful. Group II homing endonucleases are promising for targeted gene disruptions because they are easily engineered for novel specificities by altering the cognate intron sequences (DNA specificity being dictated by base pairing with the RNA component of the intron-protein complex, rather than by only the protein contacts to DNA). However, these systems are more appropriate for gene disruption by insertion of a mobile element than for gene conversion, and require the presence of packaging of significant amounts of genetic information, including a large multifunctional reading frame (RT, endonuclease and maturase) and the cognate intron sequence for the generation of reactive RNP for reverse splicing and gene insertion.
Engineering and Directed Evolution of Homing Endonucleases.
One strategy in the art to alter homing endonuclease specificity for intein-associated enzymes has been to exchange entire intein-binding domains or portions thereof. Experiments of this type have shown, for example, that the PI-SceI protein splicing domain can be used as a site-specific DNA-binding module in chimeric protein constructs (domain swapping between the PI-SceI and a homolog from Candida tropicalis (PI-CtrIP) was constructed) (Steuer et al, 2004).
Additionally, several studies have demonstrated that domains from unrelated free-standing LAGLIDADG enzymes can be structurally fused to create fully active, chimeric homing endonucleases that recognize corresponding chimeric target sites (Chevalier et al., 2002, Epinat et al., 2003; Steurer et al., 2004). For example, using computational redesign, an artificial highly specific chimeric endonuclease H-DreI was generated by fusing domains of homing endonucleases I-DmoI and I-CreI. H-DreI binds a long chimeric DNA target site with nanomolar affinity. A related experiment showed that a single-chain monomeric endonuclease can be generated from a homodimer predecessor by generating a fusion of genes that encoded each subunit connected with an artificial linker (Epinat et al., 2003). Specifically, a linker from I-DmoI was used to join two copies of the I-CreI gene to generate a pseudo-symmetric single-chain enzyme that cleaves DNA with the same specificity as native I-CreI, and was shown to initiate homologous recombination in both yeast and mammalian cells.
Moreover, the role and mutability of interfacial residues between LAGLIDADG (SEQ ID NO:25) helices has been examined by grafting side-chains from the homodimeric I-CreI into the corresponding positions in the monomeric I-DmoI enzyme resulting in enzymes with novel nicking activities and oligomeric properties (Silva & Belfort, 2004).
Additionally, several methods have been used to alter homing endonuclease specificity primarily at the level of individual base-pair alterations in the cognate target site, and these methods are divided into (i) those select or screen for DNA binding activity, and (ii) those that select or screen for cleavage. For example, an adaptation of a bacterial two-hybrid strategy was used to select for variants of the intein-encoded PI-SceI endonuclease (Gimble et al., 2003), and the selected DNA binding specificities ranged from relaxed (cleaves WT and mutant targets equally) to being dramatically shifted to preferring the selection targets, but none of the variants displayed the same degree of specificity as WT PI-SceI.
A strategy for isolating I-CreI derivative with increased affinities for altered target sites has been described (Seligman et al., 2002); Sussman et al., 2004). Endonuclease mutants with single amino acid substitutions at positions predicted to make base-specific DNA contacts were assayed against DNA target site mutants in an E. coli based system where cleavage of target sites results in cell being converted from lac+ to Lac−, and where undesirable activity (cleavage of original WT site) can be suppressed through a secondary ‘negative screen for elimination of an essential reporter (e.g., antibiotic resistance marker). Using these methods, enzyme variants with shifted, rather than completely altered specificity proteins were obtained (see also Gruen et al., 2002).
Finally, an assay system designed to report on the generation of double-strand break induced homologous recombination in eukaryotic cells has been described (Perez et al 2005; see also US 2006/0206949 and US2006/0153826 to Arnould et al; both incorporated by reference herein in their entirety).
However, such prior art based screening methods whether based on domain swapping, domain fusion, enzyme fusion, grafting of side-chains, base-pair alterations in the cognate target site (whether based on selecting or screening for DNA binding activity, or selecting or screening for cleavage activity) are fundamentally limited or compromised in their screening throughput by the fact that they require the generation of combinatorial endonuclease mutant libraries and the variant endonucleases must be well tolerated by the host's genomic DNA; that is, these prior art methods all require intracellular expression of the generated homing endonuclease during the screening or selection, and thereby preclude the effective expression, selection and identification of any variant endonuclease specificities associated with genomic toxicity (e.g., those that cut in and mediate alteration of essential genomic positions). An additional limitation of the prior art is that the intracellular cleavage system must be redesigned and generated for each sequence targeted for selection.
Furthermore, while ‘phage display’ methods (Chames, et al., Nucleic Acids Research 33:e178, pages 1-10), 2005) have been described for selecting variants of a homodimeric I-CreI enzyme, this system has several fundamental disadvantages. First, such phage display systems have not been demonstrated to provide for display of a single-chain monomeric I-CreI enzyme form, most likely because expression of an active single chain monomeric I-CreI is either toxic to the host bacteria (e.g., using bacterial hosts, phage display of a whole monomeric enzyme would not segregate the active enzyme from the bacterial host cell DNA, as bacteria do not have a sequestered protein secretion pathway), or is disruptive of phage assembly (presumably, the use of a monomer of the homodimeric form generates an inactive fusion protein inside the cell, which would avoid toxicity, and/or was small enough to allow for phage assembly). In any case, no full-length single-chain monomeric active HEs or LHEs have been surface displayed using phage display, or any other type of display including cell surface display. Moreover, additional disadvantages of phage display systems are that phage are relatively small (e.g., compared to cells), and are too small to sort by some methods. Furthermore, in many instances it may not be possible to phage display enough molecules to achieve an adequate signal strength (e.g., depending on the protein, there may be only a few molecules per phage), so separation methods are limited to those comprising matrix/panning approaches, which substantially limits utility screening throughput.
Pronounced Need in the Art.
There is, therefore, a pronounced need in the art for novel site-specific DNA binding and cutting enzymes, and more particularly for novel homing endonucleases (HE) with novel DNA binding and cutting specificities, for novel methods of generation, selection and isolation of same, for novel compositions and uses comprising same, and for novel nucleic acid molecules encoding same. There is a pronounced need for novel LHE with novel DNA binding and cutting specificities, for novel methods of generation, selection and isolation of same, for novel compositions and uses comprising same, and for novel nucleic acid molecules encoding same. There is a pronounced need for methods of variant homing endonuclease expression, selection, screening and identification that are not limited to intracellular expression of the generated homing endonuclease during the screening or selection to allow for generation and identification of a more diverse set of homing endonuclease binding and cleavage specificities.