Clustered regularly interspaced short palindromic repeats (CRISPR) and CRISPR-associated proteins (Cas) constitute CRISPR-Cas systems. The CRISPR-Cas systems provide adaptive immunity against foreign polynucleotides in bacteria and archaea (see, e.g., Barrangou, R., et al., Science 315:1709-1712 (2007); Makarova, K. S., et al., Nature Reviews Microbiology 9:467-477 (2011); Garneau, J. E., et al., Nature 468:67-71 (2010); Sapranauskas, R., et al., Nucleic Acids Research 39:9275-9282 (2011); Koonin, E. V., et al., Curr. Opin. Microbiol. 37:67-78 (2017)). Various CRISPR-Cas systems in their native hosts are capable of DNA targeting (Class 1 Type I; Class 2 Type II and Type V), RNA targeting (Class 2 Type VI), and joint DNA and RNA targeting (Class 1 Type III) (see, e.g., Makarova, K. S., et al., Nat. Rev. Microbiol. 13(11):722-736 (2015); Shmakov, S., et al., Nat. Rev. Microbiol. 15:169-182 (2017); Abudayyeh, O. O., et al., Science 353:1-17 (2016)).
The classification of CRISPR-Cas systems has had many iterations. Koonin, E. V., et al., (Curr. Opin. Microbiol. 37:67-78 (2017)) proposed a classification system that takes into consideration the signature cas genes specific for individual types and subtypes of CRISPR-Cas systems. The classification also considered sequence similarity between multiple shared Cas proteins, the phylogeny of the best conserved Cas protein, gene organization, and the structure of the CRISPR array. This approach provided a classification scheme that divides CRISPR-Cas systems into two distinct classes: Class 1 comprising a multiprotein effector complex (Type I (CRISPR-associated complex for antiviral defense (“Cascade”) effector complex), Type III (Cmr/Csm effector complex), and Type IV); and Class 2 comprising a single effector protein (Type II (Cas9), Type V (Cas12a, previously referred to as Cpfl), and Type VI (Cas13a, previously referred to as C2c2)). In the Class 1 systems, Type I is the most common and diverse, Type III is more common in archaea than bacteria, and Type IV is least common.
The Type I systems comprise the signature Cas3 protein. The Cas3 protein has helicase and DNase domains responsible for DNA target sequence cleavage. To date, seven subtypes of the Type I system have been identified (i.e., Type I-A, I-B, I-C, I-D, I-E, I-F (and variants for I-F (e.g., I-Fv1, I-Fv2), and I-U) that have a variable number of cas genes. Type I cas genes include, but are not limited to, the following: cas7, cas5, cas8, cse2, csa5, cas3, cast, cas4, cas1, and cash. Examples of organisms having Type I systems are as follows: I-A, Archaeoglobus fuldgidus; I-B, Clostridium kluyveri; I-C, Bacillus halodurans; I-U, Geobacter sulfurreducens; I-D, Cyanothece sp. 8802; I-E, Escherichia coli K12; I-F, Yersinia pseudo-tuberculosis; I-F variant, Shewanella putrefaciens CN-32 (Koonin, E. V., et al., Curr. Opin. Microbiol. 37:67-78 (2017)).
Type I systems typically encode proteins that combine with a CRISPR RNA (crRNA or “guide RNA”) to form a Cascade complex. These complexes comprise multiple proteins and a CRISPR RNA (crRNA), which are transcribed from this CRISPR locus. In Type I systems, primary processing of a pre-crRNA is catalyzed by Cash. This typically results in a crRNA with a 5′ handle of 8 nucleotides, a spacer region, and a 3′ handle; both 5′ and 3′ handles are derived from the repeat sequence. In some systems, the 3′ handle forms a stem-loop structure; in other systems, secondary processing of the 3′ end of crRNA is catalyzed by ribonuclease(s) (van der Oost, J., et al., Nature Reviews Microbiology 12:479-492 (2014)).
The Cascade effector complexes of the Type I CRISPR-Cas systems comprise a backbone having paralogous Repeat-Associated Mysterious Proteins (RAMPs; e.g., Cas7 and Cas5 proteins) containing the RNA Recognition Motif (RRM) fold and additional “large” and “small” subunit proteins (see, e.g., Koonin, E. V., et al., Curr. Opin. Microbiol. 37:67-78, FIG. 2 (2017)). These Cascade effector complexes typically have a Cas5 subunit protein and several Cas7 subunit proteins. Such Cascade effector complexes also comprise the guide RNA. The Cascade effector complexes comprise the various subunit proteins arranged in an asymmetric fashion along the length of the guide RNA. The Cas5 subunit protein and the large subunit protein (Cas8 protein) are positioned at one end of the complex, enveloping the 5′ end of the guide RNA. Several copies of the small subunit protein interact with the guide RNA backbone, which is bound to multiple copies of the Cas7 subunit protein. The Cas6 subunit protein, another RAMP protein, is associated with the Cascade effector complex primarily through association with the 3′ handle (repeat region) of the crRNA. The Cas6 subunit protein usually functions as the repeat-specific RNase involved in pre-crRNA processing; however, in Type I-C systems, Cas5 functions as the repeat-specific RNase and there is no Cas6.
The primary sequences of the CRISPR-Cas Type I Cascade subunit proteins have little sequence identity; however, the presence of homologous RAMP modules and the overall structural similarity of the multiprotein effector complexes supports a common origin of these effector complexes (Koonin, E. V., et al., Curr. Opin. Microbiol. 37:67-78 (2017)).
The adaptive immunity mechanism of action in the Type I CRISPR-Cas systems involves essentially three phases: adaptation, expression, and interference. In the adaptation phase, a foreign DNA or RNA infects the host and proteins encoded by various cas genes bind regions of the infecting DNA or RNA. Such regions are called protospacers. A protospacer adjacent motif (PAM) is a short nucleotide sequence (e.g., 2 to 6 base pair DNA sequence) that is adjacent to the protospacer. PAM sequences are typically recognized by a Cas1 subunit protein/Cas2 subunit protein complex, wherein the active PAM-sensing site is associated with the Cas1 subunit proteins (Jackson, S. A., et al., Science 356:356(6333) (2017)).
In the expression phase, the CRISPR array comprising multiple spacer-repeat elements is transcribed as a single transcript. Individual spacer repeat elements are processed by an endonuclease (e.g., Type I, a Cas6 protein; and Type I-C, a Cas5 protein) into individual crRNAs. Cas subunit proteins are expressed and associate with the crRNA to form a Cascade effector complex.
The Cascade effector complex scans foreign polynucleotides infecting the host to identify DNA complementary to the spacer. In Type I systems, interference occurs when the effector complex identifies a sequence complementary to the spacer that is adjacent a PAM; and the Cas3 protein is recruited to the DNA-bound Cascade effector complex to cleave and progressively digest the foreign polynucleotide.
Makarova, K. S., et al., (Cell 168:946 (2017)) provide a summary of genes, homologs, Cascade complexes, and mechanisms of action for Type I CRISPR-Cas systems.
Although CRISPR-Cas systems have been used for genome editing, there remains a need to improve editing efficiency and editing fidelity of these systems.