Genetic engineering techniques to introduce targeted modifications into a host cell genome find use in a variety of fields. Fundamentally, the determination of how genotype influences phenotype relies on the ability to introduce targeted insertions or deletions to impair or abolish native gene function. In the field of synthetic biology, the fabrication of genetically modified microbes capable of producing compounds of interest requires the insertion of customized DNA sequences into a chromosome of the host cell; industrial scale production generally requires the introduction of dozens of genes, e.g., whole biosynthetic pathways, into a single host genome. In a therapeutic context, the ability to introduce precise genome modifications has enormous potential to address diseases resulting from single-gene defects, e.g., X-linked severe combined immune deficiency (SCID), hemophilia B, beta-thalassemia, cystic fibrosis, muscular dystrophy and sickle-cell disease.
Recent advances in genome engineering have enabled the manipulation and/or introduction of virtually any gene across a diverse range of cell types and organisms. In particular, the advent of site-specific designer nucleases has enabled site-specific genetic modifications by introducing targeted breaks into a host cell genome, i.e., genome editing. These nucleases include zinc finger nucleases (ZFNs), transcription activator-like effector nucleases (TALENs), and clustered regulatory interspaced short palindromic repeats CRISPR/Cas (CRISPR-associated)-based RNA-guided endonucleases. ZFNs have been utilized, inter alia, to modify target loci in crops (Wright et al., Plant J 44:693-705 (2005)), to improve mammalian cell culture lines expressing therapeutic antibodies (Malphettes et al., Biotechnol Bioeng 106(5):774-783 (2010)), and to edit the human genome to evoke resistance to HIV (Urnov et al., Nat Rev Genet 11(9):636-646 (2010)). Similarly, TALENs have been utilized to modify a variety of genomes, including those of crop plants (Li, et al., Nat. Biotechnol. 30: 390-392 (2012)), human, cattle, and mouse (Xu et al., Molecular Therapy—Nucleic Acids 2, e112 (2013)). More recently, CRISPRs have been successfully utilized to edit the genomes of bacteria (e.g., Jiang et al., Nature Biotechnology 31(3):233-239 (2013); Qi et al., Cell, 5, 1173-1183 (2013); yeast (e.g., DiCarlo et al., Nucleic Acids Res., 7, 4336-4343 (2013)); zebrafish (e.g. Hwang et al., Nat. Biotechnol., 3, 227-229(2013)); fruit flies (e.g., Gratz et al., Genetics, 194, 1029-1035 (2013)); human cells (e.g., Cong et al., Science 6121, 819-823, (2013); Mali et al., Science, 6121, 823-826 (2013); Cho et al., Nat. Biotechnol., 3, 230-232 (2013)); and plants (e.g., Jiang et al., Nucleic Acids Research 41(20):e188 (2013)); Belhaj et al., Plant Methods 9(39) (2013)).
Site-specific nucleases induce breaks in chromosomal DNA that stimulate the host cell's cellular DNA repair mechanisms, including non-homologous end joining (NHEJ), single-strand annealing (SSA), and homology-directed repair (HDR). NHEJ-mediated repair of a nuclease-induced double-strand break (DSB) leads to the introduction of small deletions or insertions at the targeted site, leading to impairment or abolishment of gene function, e.g., via frameshift mutations. The broken ends of the same molecule are rejoined by a multi-step enzymatic process that does not involve another DNA molecule. NHEJ is error prone and imprecise, producing mutant alleles with different and unpredictable insertions and deletions of variable size at the break-site during the repair. Similarly, SSA occurs when complementary strands from sequence repeats flanking the DSB anneal to each other, resulting in repair of the DSB but deletion of the intervening sequence. In contrast, HDR typically leads to an accurately restored molecule, as it relies on a separate undamaged molecule with homologous sequence to help repair the break. There are two major sources of homologous donor sequence native to the cell: the homologous chromosome, available throughout the cell cycle, and the sister chromatid of the broken molecule (which is only available after the DNA is replicated). However, genome engineering techniques routinely introduce exogenous donor DNAs that comprise regions homologous with the target site of the DSB, and can recombine with the target site. By including desired modifications to the target sequence within the exogenous donor, these modifications can be integrated into and replace the original target sequence via HDR.
Upon nuclease-induced breakage of DNA, the host cell's choice of repair pathways depends on a number of factors, and the outcome can dictate the precision of a desired genomic modification. Such factors include the DNA damage signaling pathways of the host cell, the nature of the break, chromatin remodeling, transcription of specific repair proteins, and cyclin-dependent kinase activities present in later phases of the cell cycle. See, e.g., Beucher et al., EMBO J 28:3413-27(2009); Sorensen et al., Nat Cell Biol 7:195-201 (2005); Jazayeri et al., Nat Cell Biol 8:37-45 (2006); Huertas et al., Nature 455:689-92 (2008); Moyal et al., Mol Cell 41:529-42 (2011); and Chemikova et al., Radiat Res 174:558-65 (2010). If a donor DNA with strong homology to the cleaved DNA is present, the chances of integration of the donor by homologous recombination increase significantly. See, e.g., Moehle et al., Proc. Natl Acad. Sci. USA, 9:3055-3060 (2007); Chen et al., Nat. Methods, 9, 753-755 (2011). However, the overall frequency at which a homologous donor DNA is integrated via HDR into a cleaved target site, as opposed to non-integrative repair of the target site via NHEJ, can still be quite low. Recent studies suggest that HDR-mediated editing is generally a low efficiency event, and the less precise NHEJ can predominate as the mechanism of repair for DSBs.
For example, Mali et al. (Science 339:823-826 (2013)) attempted gene modification in human K562 cells using CRISPR (guide RNA and Cas9 endonuclease) and a concurrently supplied single-stranded donor DNA, and observed an HDR-mediated gene modification at the AAVS1 locus at a frequency of 2.0%, whereas NHEJ-mediated targeted mutagenesis at the same locus was observed at a frequency of 38%. Li et al. (Nat Biotechnol. (8):688-91 (2013)) attempted gene replacement in the plant Nicotiana benthamiana using CRISPR (guide RNA and Cas9 endonuclease) and a concurrently supplied double-stranded donor DNA, and observed an HDR-mediated gene replacement at a frequency of 9.0%, whereas NHEJ-mediated targeted mutagenesis was observed at a frequency of 14.2%. Kass et al. (Proc Natl Acad Sci USA. 110(14): 5564-5569 (2013)) studied HDR in primary normal somatic cell types derived from diverse lineages, and observed that mouse embryonic and adult fibroblasts as well as cells derived from mammary epithelium, ovary, and neonatal brain underwent HDR at I-SceI endonuclease-induced DSBs at frequencies of approximately 1% (0.65-1.7%). Kass and others have reported higher HDR activity when cells are in S and G2 phases of the cell cycle. Li et al. (Nat Biotechnol. (8):688-91 (2013)) tested the possibility of enhancing HDR in Nicotiana benthamiana by triggering ectopic cell division, via co-expression of Arabidopsis CYCD3 (Cyclin D-Type 3), a master activator of the cell cycle; however, this hardly promoted the rate of HDR (up to 11.1% from 9% minus CYCD3). Strategies to improve HDR rates have also included knocking out the antagonistic NHEJ repair mechanism. For example, Qi et al. (Genome Res 23:547-554 (2013)) reported an increase of 5-16 fold in HDR-mediated gene targeting in Arabidopsis for the ku70 mutant and 3-4 fold for the lig4 mutant. However, the overall rates were observed to be no higher than ˜5%, with most less than 1%. Furthermore, once the desired gene-targeting event was produced, the ku70 or lig4 mutations had to be crossed out of the mutant plants.
Given the relatively low rate of HDR-mediated integration in most cell types, insertion of exogenous DNA into the chromosome typically requires the concomitant integration of a selectable marker, which enables enrichment for transformed cells that have undergone the desired integration event. However, this introduces extraneous sequences into the genome which may not be compatible with downstream applications, and prolonged expression of the marker may also have deleterious effects. For example, integration of the neomycin resistance gene into human cell genomes, followed by extended culturing times in G418, has been reported to cause changes to the cell's characteristics, and expression of enhanced green fluorescent protein (EGFP) and other fluorescent proteins has been reported to cause immunogenicity and toxicity. See, e.g., Barese et al., Human Gene Therapy 22:659-668 (2011); Morris et al., Blood 103:492-499 (2004); and Hanazono et al., Human Gene Therapy 8:1313-1319 (1997). Additionally, the integration of selectable-marker genes in genetically modified (GM) plants has raised concerns of horizontal transfer to other organisms; in the case of antibiotic resistance markers, there is particular concern that these markers could lead to an increase in antibiotic resistant bacterial strains. A similar concern relates to the integration of herbicide-resistance markers and the possible creation of new aggressive weeds. At a minimum, removal of integrated marker sequences at later stages is time and labor intensive. This is particularly problematic where only a limited cache of selectable markers are available in a given host, and markers must be recycled to enable additional engineering steps. Thus, certain applications warrant introducing only the minimum exogenous sequences needed to effect a desired phenotype, e.g., for safety and/or regulatory compliance, and may ultimately require the avoidance of marker integration altogether.
Thus, there exists a need for methods and compositions that improve the efficiency and/or selection of HDR-mediated integration of one or more exogenous nucleic acids into a host cell genome. Moreover, there exists a need for genome engineering strategies that do not require co-integration of coding sequences for selectable markers. These and other needs are met by the compositions and methods provided herein.