Field of the Invention
The present invention relates to the field of molecular biology. More specifically, it relates to methods and nucleic acid constructs for engineering long nucleic acid sequences in vivo using a combination of a recombineering system and a CRISPR/Cas system.
Description of Related Art
Numerous organisms are known in the art that have one or more characteristics, features, or capabilities that have been engineered into them to achieve a certain goal. For example, in view of the potential exhaustion of natural resources, foremost the dwindling petroleum reserves, certain organisms have been genetically modified to quickly and efficiently produce compounds that can replace petrochemicals (for example, “biofuels”) in order to provide potential alternatives. Likewise, certain plants have been engineered to be resistant to herbicides, molds, or viruses. As yet another example, plants and microorganisms have been engineered to increase nutritional value or produce bioactive agents, such as pharmaceuticals and biologics.
Recent advances in chemical synthesis of DNA oligomers and their assembly into larger double stranded DNA (dsDNA) structures allow generation of DNA sequences and subsequent controlled manipulation of target organisms essentially at will, a process often referred to in the art as “synthetic biology”. Assembly of dsDNA from single stranded DNA oligomers is usually limited to about 1 kilobase (kb) in length due to the low fidelity of the chemical DNA synthesis process. These 1 kb or so segments (sometimes referred to in the art as “parts”) are then assembled into larger functional elements of up to about 10 kb in length (sometimes referred to in the art as “devices”). And even larger assemblies up to about 100 kb (sometimes referred to in the art as “systems”) are envisioned. However, due to the difficulties in manipulating DNA molecules of 20 kb or greater in length, in vivo host-based technologies will have to be developed or refined to assemble and manipulate “systems”.
Aside from the challenges of assembly of large synthetic constructs, the major hurdle in controlled manipulation is the targeted integration and modification of a host genome by the synthetic DNA constructs. Integration into the host genome can be achieved through homologous recombination, a process by which dsDNA is integrated into the host genome at a pre-determined site by virtue of matching sequences (usually several hundred base pairs) between the end of a linear DNA construct and the host genome. With the notable exception of yeast, homologous recombination is an extremely inefficient process. Reasonable homologous recombination frequencies in the bacterial host E. coli require the use of the λ-red/gam or the recE/recT systems. The bacteriophage-derived λ-red/gam system consists of three components: a 5′-3′ exonuclease (λ-exo), a single-strand binding protein (beta), and an inhibitor of the host exonuclease recBCD (gam). The recE and recT genes are encoded in an integrated pro-phage in the E. coli and perform analogous functions to λ-exo and beta, respectively. In a recA+ host cell, integration efficiencies of about 1/104 cells can be achieved, depending on the length of the homologous flanking sequences. The process of λ-red/gam or recET assisted homologous recombination is generally referred to as “recombineering”. Unless the recombination event generates a directly selectable phenotype, a selectable marker (usually a drug resistance marker) has to be included in the recombined DNA segment to select for the rare recombinants. The selection marker can be removed at a later stage using a site specific recombinase, such as Flp, if the marker is flanked by site-specific recombination target sites. However, the removal of the selection marker leaves a scar behind (e.g., the site specific recombination site). A popular recombineering system employing these tools has been described by Datsenko and Wanner (1). Despite these improvements, this is still a cumbersome procedure because it requires successively the curing of the λ-red/gam expression plasmid, introduction of the site-specific recombinase plasmid, verification of the loss of the selection marker, and finally the curing of the site-specific recombinase plasmid.
A variant of the recombineering procedure has been developed over the last years based the discovery that, in the presence of λ-gam (a ssDNA binding protein), single stranded DNA oligomers up to 90 nucleotides in length are incorporated into the lagging strand during DNA replication, essentially acting as Okazaki fragments (2). The λ-gam mediated incorporation of the ssDNA oligomers is much more efficient, with rates of greater than 1% achievable, depending on the degree of homology with the template strand (2, 3). However, due to the short sequence modified, the oligomer-directed modifications are only selectable if the resulting mutation generates a directly selectable phenotype.
Even though λ-red/gam and recET are derived from E. coli phages, their utility seems to be transferable to at least some other bacterial hosts, making them potentially universal tools for application in prokaryotes (4, 5).
A new class of nucleic acid targeting systems called CRISPR/Cas has been discovered in prokaryotes that somewhat resemble siRNA/miRNA systems found in eukaryotes. The system consists of an array of short repeats with intervening variable sequences of constant length (clusters of regularly interspaced short palindromic repeats, or CRISPRs) and CRISPR-associated proteins (Cas). The variable sequences located between the short repeat sequences are sequences of infecting viruses (i.e., phages) or foreign plasmids, which have been removed from the virus or plasmid and incorporated into the host genome between the short repeat sequences. The RNA of the transcribed CRISPR arrays is processed by a subset of the Cas proteins into small guide RNAs containing the viral or plasmid sequences, which direct Cas-mediated cleavage of viral or plasmid nucleic acid sequences corresponding to the small guide RNAs. CRISPR/Cas systems are fairly ubiquitous in prokaryotes and seem to be distributed by lateral gene transfer, as some bacteria contain CRISPR/Cas systems but other closely related bacteria do not (for example, E. coli K12 strains carry a CRISPR/Cas system, whereas E. coli B strains do not). The primary function of the CRISPR/Cas system appears to be viral immunity, as most CRISPR encoded targets correspond to bacteriophage genomes (6).
The majority of the known CRISPR/Cas systems guide cleavage of RNA. However, in Streptococcus thermophilus, a CRISPR/Cas system (CRISPR3) has been described that directs cleavage of DNA, resulting in double strand breaks within the sequence targeted by the guide RNA (7, 8, 9). The only additional requirement is a 3-4 base pair consensus sequence (GGNG according to (8); TGG according to (9)) located 1 nucleotide 3′ (GGNG) or immediately adjacent (TGG) to the guide RNA matching sequence. This sequence is called the protospacer-adjacent motif (PAM). This arrangement prevents self-cleavage of the CRISPR arrays. CRISPR3 can thus act as a programmable restriction enzyme, cleaving selectively at any GGNG (or TGG) sequence located in the appropriate place near a pre-defined target sequence that is complementary to the guide RNA sequence. In an organism with a 50% GC content, CRISPR3-targetable sites are expected every 32 base pairs. This system is transferable to other hosts, as it is functional in E. coli (8). The general structure of the S. thermophilus CRISPR/Cas9 system is shown in FIG. 1.
The CRISPR/Cas9 system has recently been combined with a recombineering system in Streptococcus pneumoniae and Escherichia coli to produce mutations at target sites with a high yield of mutants (10). The authors show that two simultaneous mutations at two different target sites can be created using two different guide RNA sequences. According to the authors, selection of successful dual mutants, and the ability to achieve a high efficiency of genome editing, results from co-selection of two selectable markers, enhancement of recombination by the CRISPR/Cas system, and selection against unmutated cells by the CRISPR/Cas system.