Clustered regularly interspaced short palindromic repeats (CRISPR) and CRISPR-associated proteins (Cas) constitute the CRISPR-Cas system. The CRISPR-Cas system provides adaptive immunity against foreign DNA in bacteria (see, e.g., Barrangou, R., et al., Science 315:1709-1712 (2007); Makarova, K. S., et al., Nature Reviews Microbiology 9:467-477 (2011); Garneau, J. E., et al., Nature 468:67-71 (2010); Sapranauskas, R., et al., Nucleic Acids Research 39:9275-9282 (2011)).
CRISPR-Cas systems have recently been reclassified into two classes, comprising five types and sixteen subtypes (see Makarova, K., et al., Nature Reviews Microbiology 13:1-15 (2015)). This classification is based upon identifying all Cas genes in a CRISPR-Cas locus and determining the signature genes in each CRISPR-Cas locus, ultimately placing the CRISPR-Cas systems in either Class 1 or Class 2 based upon the genes encoding the effector module, i.e., the proteins involved in the interference stage. Recently a sixth CRISPR-Cas system (Type VI) has been identified (see Abudayyeh O., et al., Science 353(6299):aaf5573 (2016)). Certain bacteria possess more than one type of CRISPR-Cas system.
Class 1 systems have a multi-subunit crRNA-effector complex, whereas Class 2 systems have a single protein, such as Cas9, Cpf1, C2c1, C2c2, C2c3, or a crRNA-effector complex. Class 1 systems comprise Type I, Type III, and Type IV systems. Class 2 systems comprise Type II, Type V, and Type VI systems.
Type II systems have cas1, cas2, and cas9 genes. The cas9 gene encodes a multi-domain protein that combines the functions of the crRNA-effector complex with DNA target sequence cleavage. Type II systems are further divided into three subtypes, subtypes II-A, II-B, and II-C. Subtype II-A contains an additional gene, csn2. Examples of organisms with a subtype II-A systems include, but are not limited to, Streptococcus pyogenes, Streptococcus thermophilus, and Staphylococcus aureus. Subtype II-B lacks the csn2 protein, but has the cas4 protein. An example of an organism with a subtype II-B system is Legionella pneumophila. Subtype II-C is the most common Type II system found in bacteria and has only three proteins, Cas1, Cas2, and Cas9. An example of an organism with a subtype II-C system is Neisseria lactamica. 
Type V systems have a cpf1 gene and cas1 and cas2 genes (see Zetsche, B., et al., Cell 163:1-13 (2015)). The cpf1 gene encodes a protein, Cpf1, that has a RuvC-like nuclease domain that is homologous to the respective domain of Cas9, but lacks the HNH nuclease domain that is present in Cas9 proteins. Type V systems have been identified in several bacteria including, but not limited to, Parcubacteria bacterium, Lachnospiraceae bacterium, Butyrivibrio proteoclasticus, Peregrinibacteria bacterium, Acidaminococcus spp., Porphyromonas macacae, Porphyromonas crevioricanis, Prevotella disiens, Moraxella bovoculi, Smithella spp., Leptospira inadai, Franciscella tularensis, Franciscella novicida, Candidatus methanoplasma termitum, and Eubacterium eligens. Recently it has been demonstrated that Cpf1 also has RNase activity and is responsible for pre-crRNA processing (see Fonfara, I., et al., Nature 532(7600):517-521 (2016)).
In Class 2 systems, the crRNA is associated with a single protein and achieves interference by combining nuclease activity with RNA-binding domains and base-pair formation between the crRNA and a nucleic acid target sequence.
In Type II systems, nucleic acid target sequence binding involves Cas9 and the crRNA, as does the nucleic acid target sequence cleavage. In Type II systems, the RuvC-like nuclease (RNase H fold) domain and the HNH (McrA-like) nuclease domain of Cas9 each cleave one of the strands of the double-stranded nucleic acid target sequence. The Cas9 cleavage activity of Type II systems also requires hybridization of crRNA to a tracrRNA to form a duplex that facilitates the crRNA and nucleic acid target sequence binding by the Cas9 protein.
In Type V systems, nucleic acid target sequence binding involves Cpf1 and the crRNA, as does the nucleic acid target sequence cleavage. In Type V systems, the RuvC-like nuclease domain of Cpf1 cleaves one strand of the double-stranded nucleic acid target sequence, and a putative nuclease domain cleaves the other strand of the double-stranded nucleic acid target sequence in a staggered configuration, producing 5′ overhangs, which is in contrast to the blunt ends generated by Cas9 cleavage. These 5′ overhangs may facilitate insertion of DNA.
The Cpf1 cleavage activity of Type V systems does not require hybridization of crRNA to tracrRNA to form a duplex, rather the crRNA of Type V systems uses a single crRNA that has a stem-loop structure forming an internal duplex. Cpf1 binds the crRNA in a sequence and structure specific manner that recognizes the stem loop and sequences adjacent to the stem loop, most notably the nucleotides 5′ of the spacer sequences that hybridizes to the nucleic acid target sequence. This stem-loop structure is typically in the range of 15 to 19 nucleotides in length. Substitutions that disrupt this stem-loop duplex abolish cleavage activity, whereas other substitutions that do not disrupt the stem-loop duplex and do not abolish cleavage activity. Nucleotides 5′ of the stem loop adopt a pseudo-knot structure further stabilizing the stem-loop structure with non-canonical Watson-Crick base pairing, triplex interaction, and reverse Hoogsteen base pairing (see Yamano, T., et al., Cell 165(4):949-962 (2016)). In Type V systems, the crRNA forms a stem-loop structure in the 5′-end sequences, and the sequence of the 3′-end sequence is complementary to a sequence in a nucleic acid target sequence.
Other proteins associated with Type V crRNA and nucleic acid target sequence binding and cleavage include Class 2 candidate 1 (C2c1) and Class 2 candidate 3 (C2c3). C2c1 and C2c3 proteins are similar in length to Cas9 and Cpf1 proteins, ranging from approximately 1,100 amino acids to approximately 1,500 amino acids. C2c1 and C2c3 proteins also contain RuvC-like nuclease domains and have an architecture similar to Cpf1. C2c1 proteins are similar to Cas9 proteins in requiring a crRNA and a tracrRNA for nucleic acid target sequence binding and cleavage but have an optimal cleavage temperature of 50° C. C2c1 proteins target an AT-rich protospacer adjacent motif (PAM), similar to the PAM of Cpf1, which is 5′ of the nucleic acid target sequence (see, e.g., Shmakov, S., et al., Molecular Cell 60(3):385-397 (2015)).
Class 2 candidate 2 (C2c2) does not share sequence similarity with other CRISPR effector proteins and was recently identified as a Type VI system (see Abudayyeh, O., et al., Science 353(6299):aaf5573 (2016)). C2c2 proteins have two HEPN domains and demonstrate single-stranded RNA cleavage activity. C2c2 proteins are similar to Cpf1 proteins in requiring a crRNA for nucleic acid target sequence binding and cleavage, although not requiring tracrRNA. Also, similar to Cpf1, the crRNA for C2c2 proteins forms a stable hairpin, or stem-loop structure, that aids in association with the C2c2 protein. Type VI systems have a single polypeptide RNA endonuclease that utilizes a single crRNA to direct site-specific cleavage. Additionally, after hybridizing to the target RNA complementary to the spacer, C2c2 becomes a promiscuous RNA endonuclease exhibiting non-specific endonuclease activity toward any single-stranded RNA in a sequence independent manner (see East-Seletsky, A., et al., Nature 538(7624):270-273 (2016)).
Regarding Class 2 Type II CRISPR-Cas systems, a large number of Cas9 orthologs are known in the art as well as their associated polynucleotide components (tracrRNA and crRNA) (see, e.g., Fonfara, I., et al., Nucleic Acids Research 42(4):2577-2590 (2014), including all Supplemental Data; Chylinski K., et al., Nucleic Acids Research 42(10):6091-6105 (2014), including all Supplemental Data). In addition, Cas9-like synthetic proteins are known in the art (see U.S. Published Patent Application No. 2014-0315985, published 23 Oct. 2014).
Cas9 is an exemplary Type II CRISPR Cas protein. Cas9 is an endonuclease that can be programmed by the tracrRNA/crRNA to cleave, in a site-specific manner, a DNA target sequence using two distinct endonuclease domains (HNH and RuvC./RNase H-like domains) (see U.S. Published Patent Application No. 2014-0068797, published 6 March 2014; see also Jinek, M., et al., Science 337:816-821 (2012)).
Typically, each wild-type CRISPR-Cas9 system includes a crRNA and a tracrRNA. The crRNA has a region of complementarity to a potential DNA target sequence and a second region that forms base-pair hydrogen bonds with the tracrRNA to form a secondary structure, typically to form at least one stem structure. The region of complementarity to the DNA target sequence is the spacer. The tracrRNA and a crRNA interact through a number of base-pair hydrogen bonds to form secondary RNA structures. Complex formation between tracrRNA/crRNA and Cas9 protein results in conformational change of the Cas9 protein that facilitates binding to DNA, endonuclease activities of the Cas9 protein, and crRNA-guided site-specific DNA cleavage by the endonuclease Cas9. For a Cas9 protein/tracrRNA/crRNA complex to cleave a double-stranded DNA target sequence, the DNA target sequence is adjacent to a cognate PAM. By engineering a crRNA to have an appropriate spacer sequence, the complex can be targeted to cleave at a locus of interest, e.g., a locus at which sequence modification is desired.
A variety of Type II CRISPR-Cas system crRNA and tracrRNA sequences, as well as predicted secondary structures are known in the art (see, e.g., Ran, F. A., et al., Nature 520(7546):186-191 (2015), including all Supplemental Data, in particular Extended Data FIG. 1; Fonfara, I., et al., Nucleic Acids Research 42(4):2577-2590 (2014), including all Supplemental Data, in particular Supplemental Figure S11). Predicted tracrRNA secondary structures were based on the Constraint Generation RNA folding model (Zuker, M., Nucleic Acids Research 31:3406-3415 (2003). RNA duplex secondary structures were predicted using RNAcofold of the Vienna RNA package (Bernhart, S. H., et al., Algorithms for Molecular Biology 1(1):3 (2006); Hofacker, I. L., et al., Journal of Molecular Biology 319:1059-1066 (2002)) and RNAhybrid (bibiserv.techfak.uni-bielefeld.de/rnahybrid/). The structure predictions were visualized using VARNA (Darty, K., et al., Bioinformatics 25:1974-1975 (2009)). Fonfara, I., et al., show that the crRNA/tracrRNA complex for Campylobacter jejuni does not have the bulge region; however, the complex retains a stem structure located 3′ of the spacer that is followed in the 3′ direction with another stem structure.
The spacer of Class 2 CRISPR-Cas systems can hybridize to a nucleic acid target sequence that is located 5′ or 3′ of a PAM, depending upon the Cas protein to be used. A PAM can vary depending upon the Cas polypeptide to be used. For example, if Cas9 from S. pyogenes is used, the PAM can be a sequence in the nucleic acid target sequence that comprises the sequence 5′-NRR-3′, wherein R can be either A or G, N is any nucleotide, and N is immediately 3′ of the nucleic acid target sequence targeted by the nucleic acid target binding sequence. A Cas protein may be modified such that a PAM may be different compared with a PAM for an unmodified Cas protein. For example, if Cas9 from S. pyogenes is used, the Cas9 protein may be modified such that the PAM no longer comprises the sequence 5′-NRR-3′, but instead comprises the sequence 5′-NNR-3′, wherein R can be either A or G, N is any nucleotide, and N is immediately 3′ of the nucleic acid target sequence targeted by the nucleic acid target sequence.
Other Cas proteins recognize other PAMs, and one of skill in the art is able to determine the PAM for any particular Cas protein. For example, Cpf1 has a thymine-rich PAM site that targets, for example, a TTTN sequence (see Fagerlund, R., et al., Genome Biology 16:251 (2015)).
The RNA-guided Cas9 endonuclease has been widely used for programmable genome editing in a variety of organisms and model systems (see, e.g., Jinek M., et al., Science 337:816-821 (2012); Jinek M., et al., eLife 2:e00471. doi: 10.7554/eLife.00471 (2013); U.S. Published Patent Application No. 2014-0068797, published 6 Mar. 2014).
Genome engineering includes altering the genome by deleting, inserting, mutating, or substituting specific nucleic acid sequences. The alteration can be gene- or location-specific. Genome engineering can use site-directed nucleases, such as Cas proteins and their cognate polynucleotides, to cut DNA, thereby generating a site for alteration. In certain cases, the cleavage can introduce a double-strand break (DSB) in the DNA target sequence. DSBs can be repaired, e.g., by non-homologous end joining (NHEJ), microhomology-mediated end joining (MMEJ), or homology-directed repair (HDR). HDR relies on the presence of a template for repair. In some examples of genome engineering, a donor polynucleotide or portion thereof can be inserted into the break.