A CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats) is a distinctive DNA locus (i.e., an array or cluster of DNA sequences) found in the genomes of many bacteria and archaea (for recent review see e.g., Sorek et al., “CRISPR—a widespread system that provides acquired resistance against phages in bacteria and archaea,” Nature Reviews Microbiology, AOP, published online 24 Dec. 2007; doi:10.1038/nrmicro1793).
Recently, it has been shown that CRISPR sequences can function as a type of “immune system” that help bacteria defend against phage infections (see e.g., Barrangou et al., “CRISPR Provides Acquired Resistance Against Viruses in Prokaryotes,” Science 315:1709-12 (March 2007); Deveau et al., J. Bacteriol. 190(4):1390-1400 (February 2008); Horvath et al., J. Bacteriol. 190(4):1401-12 (February 2008)). At least eight distinct CRISPR loci have been identified in the genomes of lactic acid bacteria (see Horvath et al., “Comparative analysis of CRISPR loci in lactic acid bacteria genomes,” Int. J. Food Microbiol., Epub Jul. 15, 2008).
Furthermore, it has been shown that phage resistance in bacteria can be modified by introducing CRISPR sequences into the bacterial genome. For example, removal or addition of particular CRISPR sequences from S. thermophilus strains resulted in a modified phage-resistance phenotype (see e.g., Barrangou et al. 2007 supra; Deveau et al., 2008 supra). Intl Publ. No. WO 2007/025097 A2, published Mar. 1, 2007 (which is hereby incorporated by reference herein) discloses inter alia the use of CRISPR loci to modulate the resistance of a bacterial strain against an exogenous nucleic acid (e.g., phage infection).
The structure of a CRISPR locus includes a number of short repeating sequences referred to as “repeats.” The repeats occur in clusters and up to 249 repeats have been identified in a single CRISPR locus (see e.g., Sorek et al., 2007, supra) and are usually regularly spaced by unique intervening sequences referred to as “spacers.” Typically, CRISPR repeats vary from about 24 to 47 bp in length and are partially palindromic (see Sorek et al., 2007, supra). The repeats are generally arranged in clusters (up to about 20 or more per genome) of repeated units (see Sorek et al., 2007, supra). The spacers are located between two repeats and typically each spacer has a unique sequences are from about 20-72 bp in length (see Sorek et al., 2007, supra). Many spacers are identical to or have high homology with known phage sequences. It has been shown that the insertion of a spacer sequence from a specific phage into a bacterial CRISPR can confer resistance to that phage (see e.g., Barrangou et al., “CRISPR Provides Acquired Resistance Against Viruses in Prokaryotes,” Science 315:1709-12 (March 2007).
In addition to repeats and spacers, a CRISPR locus also includes a leader sequence and often a set of two to six associated cas genes. The leader sequence typically is an AT-rich sequence of up to 550 bp directly adjoining the 5′ end of the first repeat (see Sorek et al., 2007, supra). New repeat-spacer unit is almost always added to the CRISPR locus between the leader and the first repeat (see e.g., Sorek et al., 2007, supra). However, it has been found acquisition of phage resistance also can occur associated with new spacer addition and concomitant spacer deletion away from the CRISP leader sequence (see e.g., Deveau et al., supra).
It is believed that the proteins encoded by the associated cas genes act as a bacterial “immune system” that confer resistance against phages. It has been suggested that the array of repeat-spacer sequence are transcribed into a long RNA and the repeats assume a secondary structure which the cas proteins recognize and process to form small RNAs that function via an RNA-interference-like mechanism (see Sorek et al., 2007, supra). Brouns et al. (2008) have reported that a complex of five cas proteins (CasA, CasB, CasC, CasD, and CasE) in the E. coli K12 CRISPR/cas system referred to as “Cascade” cleave a CRISPR RNA precursor in each repeat and retains the cleavage product containing a virus-derived sequence. It is proposed that assisted by the Cas3 helicase, these mature CRISPR RNAs then serve as small guide RNAs that enable Cascade to interfere with virus proliferation. (see e.g., Brouns et al., “Small CRISPR RNAs Guide Antiviral Defense in Prokaryotes,” Science 321: 960-964 (2008)).
CRISPR sequences are among the most rapidly evolving genomic structures in bacteria. Because of this, and their relative sequence simplicity (i.e., repeat-spacer-repeat) CRISPR sequences provide an ideal genomic system for detecting, typing and tracking specific strains of bacteria. Methods for using CRISPR sequences to detect, type, and track bacterial strains have been disclosed in e.g., U.S. published application 2006/01990190 A1, published Sep. 7, 2006, which is hereby incorporated by reference herein.
A CRISPR locus also provides a very convenient, durable, natural and easy to detect genomic tagging system that does not impact other physiological properties of the tagged prokaryote. Methods for using known phage to induce a CRISPR tag (e.g., addition of a repeat-spacer unit) in a bacterial strain have been disclosed in e.g., U.S. published application 2008/0124725 A1, published May 29, 2008, which is hereby incorporated by reference herein.