Site-specific nucleases are powerful reagents for specifically and efficiently targeting and modifying a DNA sequence within a complex genome. The double-stranded DNA breaks caused by site-specific nucleases are commonly repaired through the distinct mechanisms of homologous recombination or non-homologous end joining (NHEJ). Although homologous recombination typically uses the sister chromatid of the damaged DNA as a donor matrix from which to perform perfect repair of the genetic lesion, NHEJ is an imperfect repair process that often results in changes to the DNA sequence at the site of the double strand break. Mechanisms involve rejoining of what remains of the two DNA ends through direct re-ligation (Critchlow and Jackson 1998) or via the so-called microhomology-mediated end joining (Ma, Kim et al. 2003). Repair via non-homologous end joining (NHEJ) often results in small insertions or deletions and can be used for the creation of specific gene knockouts. There are numerous applications of genome engineering by site-specific nucleases extending from basic research to bioindustrial applications and human therapeutics. Re-engineering a DNA-binding protein for this purpose has been mainly limited to the naturally occurring LADLIDADG homing endonuclease (LHE), artificial zinc finger proteins (ZFP), the Transcription Activator-Like Effectors nucleases (TALE-nucleases), and the recently described CRISPR-Cas system.
Homing endonucleases, also known as meganucleases, are sequence-specific endonucleases with large (>14 bp) cleavage sites that can deliver DNA double-strand breaks at specific loci (Thierry and Dujon 1992). There are a handful of known homing endonuclease families which are demarcated on the basis of canonical motifs and the structural features which comprise them. However, they all share the property of recognizing and cleaving long DNA targets. Homing endonucleases were the first, and to date only, naturally occurring endonucleases with specificities at or approaching ‘genome level’, meaning having putative target sequences that occur very infrequently, or perhaps singularly, in their host genome. As a general property, HEs have a moderate degree of fidelity to their DNA target sequences, such that most base pair substitutions to their DNA target sequences reduce or eliminate the ability of the HE to bind or cleave it. HEs are therefore the most specific naturally occurring endonucleases yet discovered, and indeed this property is critical to the natural life cycle of the genetic elements in which they are encoded.
Homing endonuclease genes (HEGs) are classified as a type of selfish genetic element, as their DNA recognition and cleavage activity can lead to a DNA repair event that results in the copying of the HEG into the cleavage site. This mechanism of horizontal gene transfer, referred to as ‘homing’ results in a super-Mendelian inheritance pattern. Using this mechanism, HEGs and their endonuclease gene products can spread rapidly within their host species populations, and have also spread throughout all kingdoms of life over evolutionary time. HEGs are most commonly found in highly conserved genomic locations that do not impart fitness costs on their host organisms, such as within introns or as non-disruptive N- or C-terminal fusions to host proteins.
The LAGLIDADG homing endonuclease family (LHE) comprises a group of compact (<320 amino acids) nucleases whose structural and mechanistic properties have been studied extensively owing to their attractive properties for genome engineering applications. LHEs operate either as dimers or as pseudo-dimeric monomers, with the DNA cleaving active site occurring at the DNA-facing end of the interface of the two subunits (in dimeric LHEs) or domains (in monomeric LHEs). The LAGLIDADG consensus motifs for which LHEs are named are found in the two central alpha helices which form this interface between the two subunits or domains. At the bottom of each LAGLIDADG helix are the residues which together coordinate the hydrolysis reaction if the appropriate conditions are met, such as if the LHE finds and binds to an appropriate DNA target sequence. The active site covers the ‘central-4’ DNA bases of the DNA target sequence.
On either side of the active site are the two DNA binding domains LHEs use to recognize their DNA target sequences. Each domain comprises an anti-parallel beta sheet which wraps around nearly a complete turn of DNA and contacts 9 base pairs of DNA sequence. Members of the LHE family thus recognize 22 base pair DNA target sequences (9 base pairs for each domain, and 4 base pairs covered by the active site), which are partially palindromic in the case of dimeric LHEs, but can be entirely asymmetric for monomeric LHEs. Emanating from each anti-parallel beta sheet are the amino acid side chains which comprise the DNA recognition interface. While there is much amino acid conservation throughout the non-DNA interfacing residues amongst the LHE family, DNA recognition interface amino acid compositions vary significantly. This is because for each LHE the DNA recognition interface comprises an extensive network of side chain-to-side chain and side chain-to-DNA contacts, most of which is necessarily unique to a particular LHE's DNA target sequence. The amino acid composition of the DNA recognition interface (and the correspondence of it to a particular DNA sequence) is therefore the definitive feature of any natural or engineered LHE. The DNA recognition interface functions in determining the identity of the DNA target sequence which can be accommodated and hydrolyzed and also the affinity and specificity properties which define the quality of the LHE according to the demands of the application.
Owing to their small size and exquisite specificity properties, LHEs have been the subject of numerous efforts to engineer their DNA recognition properties with the desired outcome of cleaving and altering genes of interest in research, biotechnology, crop science, global health, and human therapeutics applications. However, the extent of the networks of residues which form the DNA recognition interface has generally prevented efficient methods for re-addressing LHEs to DNA target sequences of interest. This has led to continued innovation in field of gene-specific nuclease engineering, with three endonuclease alternative platforms now validated as having the capacity to target DNA sequences with ranging (but generally high) levels of specificity, as well as new and improved methods for overcoming the challenges of engineering the DNA recognition interfaces of LHEs.
Zinc finger nucleases (ZFNs) generating by fusing a plurality of Zinc finger-based DNA binding domains to an independent catalytic domain (Kim, Cha et al. 1996; Smith, Berg et al. 1999; Smith, Bibikova et al. 2000) represent another type of engineered nuclease commonly used to stimulate gene targeting and have been successfully used to induce gene correction, gene insertion and gene deletion in research and therapeutic applications. The archetypal ZFNs are based on the catalytic domain of the Type IIS restriction enzyme FokI and Zinc Finger-based DNA binding domains made of strings of 3 or 4 individual Zinc Fingers, each recognizing a DNA triplet (Pabo, Peisach et al. 2001). Two Zinc Finger-FokI monomers have to bind to their respective Zinc Finger DNA-recognition sites on opposite strands in an inverted orientation in order to form a catalytically active dimer that catalyze double strand cleavage (Bitinaite, Wah et al. 1998).
Transcription activator-like effectors (TALEs) were the next artificial endonuclease platform. TALEs derived from a family of proteins used in the infection process by plant pathogens of the Xanthomonas or Ralstonia genus are repetitive proteins characterized by 14-20 repeats of 33-35 amino acids differing essentially by two positions. Each base pair in the DNA target is contacted by a single repeat, with the specificity resulting from the two variant amino acids of the repeat (the so-called repeat variable dipeptide, RVD). The apparent modularity of these DNA binding domains has been confirmed to a certain extent by modular assembly of designed TALE-derived protein with new specificities (Boch, Scholze et al. 2009; Moscou and Bogdanove 2009). Very similarly to ZFNs, TALEs were readily adapted into site-specific nucleases by arraying TALE repeats with RVDs corresponding to the target sequence of choice and fusing the resultant array to a FokI domain. As such, DNA cleavage by a TALE-Nuclease requires two DNA recognition regions flanking an unspecific central region. TALE nucleases have proliferated widely since 2010 owing to their ease of production and improved double-strand break generating efficiency.
Of these distinct technologies, it is important to distinguish the advantaged properties of each and to determine innovative ways to capture these properties for the appropriate genome engineering applications. One of the most powerful applications of site-specific nuclease technology is in the field of human therapeutics. In one prominent genome engineering strategy to treat human immunodeficiency virus type-1 (HIV-1), site-specific nucleases have been developed to target the CCR5 gene. The CCR5 gene encodes the primary co-receptor which HIV-1 uses to enter into human T cells. Longstanding genetic and experimental evidence has shown that individuals who are homozygous for a disruption allele of CCR5 (the CCR5Δ32 allele) are almost completely resistant to HIV-1 infection. Moreover, a recent clinical case file demonstrated that an HIV-1 infected patient transplanted with bone marrow from a donor homozygous for the CCR5Δ32 allele was eradicated of his HIV-1 infection—the first confirmed case of an HIV-1 cure. These findings beget the development of improved, scalable genome engineering strategies targeting the CCR5 gene.
ZFN reagents have been evaluated in early phase clinical trials focused on disrupting the CCR5 gene in the T cells of HIV-1 patients. Early proof-of-concept results have shown that nuclease-mediated CCR5 gene disruption leads to promising clinical responses. Unfortunately, these results have been mitigated by the low efficiency of disruption, leading to difficulties in manufacturing biallelic CCR5 disrupted T cells, and also reports of poor ZFN specificity characteristics, which bring into question the safety of these particular nuclease reagents. Improvements in the efficiency, specificity, and manufacturability of a nuclease-based genome engineering strategy targeting the CCR5 gene are manifest if this approach is capable of producing ‘functional cures’ for HIV-1 infection.