A CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats) is a distinctive DNA locus (i.e., an array or cluster of repeated DNA sequences) found in the genomes of many bacteria and archaea (for recent reviews see e.g., Horvath and Barrangou, 2010; Karginov and Hannon, 2010).
Recently, it has been shown that CRISPR sequences can function as a type of “immune system” that help bacteria to defend themselves against phage infections (see e.g., Barrangou et al., 2007); Deveau et al., 2008; Horvath et al., 2008). At least eight distinct CRISPR loci have been identified in the genomes of lactic acid bacteria (see Horvath et al., 2009).
Furthermore, it has been shown that phage resistance in bacteria can be modified by introducing CRISPR sequences into the bacterial genome. For example, removal or addition of particular CRISPR sequences from Streptococcus. thermophilus strains resulted in a modified phage-resistance phenotype (see e.g., Barrangou et al., 2007; Deveau et al., 2008). International Publ. No. WO 2007/025097 A2, published Mar. 1, 2007 (which is hereby incorporated by reference herein) discloses inter alia the use of CRISPR loci to modulate the resistance of a bacterial strain against an exogenous nucleic acid (e.g., phage infection).
The structure of a CRISPR array includes a number of short repeating sequences referred to as “repeats.” The repeats occur in clusters and up to 249 repeats have been identified in a single CRISPR array (see e.g., Horvath and Barrangou, 2010) and are usually regularly spaced by unique intervening sequences referred to as “spacers.” Typically, CRISPR repeats vary from about 24 to 47 by in length and are partially palindromic (Horvath and Barrangou, 2010). The repeats are generally arranged in clusters (up to about 20 or more per genome) of repeated units (Horvath and Barrangou, 2010). The spacers are located between two repeats and typically each spacer has a unique sequence of about 21-72 by in length (Horvath and Barrangou, 2010). Many spacers are identical to or have high similarity with known phage sequences. It has been shown that the insertion of a spacer sequence from a specific phage into a bacterial CRISPR can confer resistance to that phage (see e.g., Barrangou et al., 2007).
In addition to repeats and spacers, a CRISPR array may also include a leader sequence and often a set of two to six associated cas genes. Typically the leader sequence is an AT-rich sequence of up to 550 by directly adjoining the 5′ end of the first repeat (Horvath and Barrangou, 2010). New repeat-spacer unit is almost always added to the CRISPR array between the leader and the first repeat (see e.g. Horvath and Barrangou, 2010). However, it has been found acquisition of phage resistance also can occur associated with new spacer addition and concomitant spacer deletion away from the CRISPR leader sequence (see e.g., Deveau et al., 2008).
It is believed that the proteins encoded by the associated cas genes act as a bacterial “immune system” that confer resistance against phages. It has been suggested that the array of repeat-spacer sequences are transcribed into a long RNA and the repeats assume a secondary structure which the Cas proteins recognize and process to form small RNAs that function via an RNA-interference-like mechanism (see Karginov and Hannon, 2010). Brouns et al. (2008) have reported that a complex of five Cas proteins (CasA, CasB, CasC, CasD, and CasE) in the Escherichia coli K12 CRISPR-Cas system referred to as “Cascade” cleaves a CRISPR RNA precursor in each repeat and retains the cleavage product containing a virus-derived sequence. It is proposed that assisted by the Cas3 helicase, these mature CRISPR RNAs then serve as small guide RNAs that enable Cascade to interfere with virus proliferation, (see e.g., Brouns et al., 2008).
CRISPR sequences are among the most rapidly evolving genomic structures in bacteria. Because of this, and their relative sequence simplicity (i.e., repeat-spacer-repeat) CRISPR sequences provide an ideal genomic system for detecting, typing and tracking specific strains of bacteria. Methods for using CRISPR sequences to detect, type, and track bacterial strains have been disclosed in e.g., U.S. published application 2006/01990190 A1, published Sep. 7, 2006, which is hereby incorporated by reference herein.
A CRISPR array also provides a very convenient, durable, natural and easy to detect genomic tagging system that does not impact other physiological properties of the tagged host. Methods for using known phage to induce a CRISPR tag (e.g., addition of a repeat-spacer unit) in a bacterial strain have been disclosed in e.g., U.S. published application 2008/0124725 A1, published May 29, 2008, which is hereby incorporated by reference herein.