Small, non-coding (nc)RNAs are found in all domains of life and function in a wide array of essential cellular processes. In eukaryotes, small ncRNAs including siRNAs and microRNAs have been shown to function in post-transcriptional gene silencing by targeting exogenous or endogenous RNAs, respectively, in a process called RNA interference, or RNAi (Hannon, 2002, Nature, 418:244-251). Another class of small RNAs referred to as piRNAs (piwi-associated) or rasiRNAs (repeat associated small interfering) regulate spreading of selfish genetic elements such as transposons or repeat elements in organisms including mammals, plants and flies (Kim V. N., 2006, Genes Dev, 20:1993-1997; Nishida and Siomi, 2006, Tanpakushitsu Kakusan Koso, 51:2450-2455; Aravin et al., 2007, Science 318:761-764; Hartig et al., 2007, Genes Dev, 21:1707-1713; Lin H., 2007, Science, 316:397).
An RNAi-like system that functions in genome defense has recently been proposed to exist in prokaryotes (Markova et al., 2006, Biol Direct, 1:7; Deveau et al., 2008, J Bacteriol, 190:1390-1400; Sorek et al., 2008, Nat Rev Microbiol, 6:181-186; Tyson and Banfield, 2008, Environ Microbiol, 10:200-207). The hallmark of the proposed prokaryotic RNAi (or pRNAi) system is the CRISPR locus, a cluster of short direct repeats that separate short variable sequences (i.e. clustered regularly interspaced short palindromic repeat). A number of the variable sequences (also sometimes called “spacers”) found in CRISPR loci display complementarity (or identity) to known prokaryotic viruses, plasmids and transposons (Bolotin et al., 2005, Microbiology, 151:2551-2561; Mojica et al., 2005, Mol Evol, 60:174-182; Pourcel et al., 2005, Microbiology, 151:653-663; Lillestol et al., 2006, Archaea, 2:59-72; Markova et al., 2006, Biol Direct, 1:7). The other signature component of the hypothesized pRNAi system is a set of protein-coding genes referred to as CRISPR-associated or Cas genes that are found in CRISPR-containing genomes (Jansen et al., 2002, Mol Microbiol, 43:1565-1575; Markarova et al., 2002, Nucleic Acids Res, 30:482-496; Haft et al., 2005, PLoS Comput Biol, 1:e60; Markova et al., 2006, Biol Direct, 1:7). The Cas genes are predicted to encode nucleases, helicases, RNA-binding proteins and a polymerase (Jansen et al., 2002, Mol Microbiol, 43:1565-1575; Markarova et al., 2002, Nucleic Acids Res, 30:482-496; Haft et al., 2005, PLoS Comput Biol, 1:e60; Markova et al., 2006, Biol Direct, 1:7). These bioinformuatically-predicted properties of the CRISPR and Cas gene products led to the hypothesis that they comprise an RNAi-like system of genome defense in prokaryotes, in which RNAs derived from the variable regions of CRISPR loci (prokaryotic silencing or psiRNAs) guide the silencing (e.g., degradation) of genome invaders by Cas proteins (Bolotin et al., 2005, Microbiology, 151:2551-2561; Lillestol et al., 2006, Archaea, 2:59-72; Markova et al., 2006, Biol Direct, 1:7). The Cas proteins are also expected to function in the processing of the psiRNAs and in the integration of new psiRNA genes (directed against newly encountered pathogens) into the genome.
Recent studies have provided strong evidence for a role of CRISPR loci in viral resistance in prokaryotes. Several groups have observed that virus exposure leads to the appearance of new virus-derived sequence elements within the CRISPR loci of surviving (resistant) isolates (Barrangou et al., 2007, Science 315:1709-1712; Deveau et al., 2008, J Bacteriol, 190:1390-1400; Horvath et al., 2008, J Bacteriol, 190:1401-1412). In addition, Barrangou et al. showed that an alteration of an organism's CRISPR sequences that generates or destroys correspondence with a viral sequence results in viral resistance and viral sensitivity, respectively (Barrangou et al., 2007, Science 315:1709-1712). However, the pathway by which CRISPR loci confer viral resistance remains hypothetical and undefined.
CRISPR loci are present in about half of bacterial genomes and nearly all archaeal genomes (Godde and Bickerton, 2006, J Mol Evol, 62:718-729; Markova et al., 2006, Biol Direct, 1:7). A given locus can contain as few as 2, and as many as several hundred repeat-psiRNA units (Grissa et al., 2007, Bioinformatics, 8:172; Sorek et al., 2008, Nat Rev Microbiol, 6:181-186). The repeat sequences are generally 25 to 45 nucleotides long and often weakly palindromic at the 5′ and 3′ termini (Jansen et al., 2002, Mol Microbiol, 43:1565-1575). Interspersed between the repeats are the variable, putative psiRNA-encoding sequences, which are usually similar in length to the repeats. RNAs arising from CRISPR loci have been detected by RNA cloning and/or Northern blotting in 3 archaeal species: Archaeoglobus fulgidus, Sulfolobus solfataricus and Sulfolobus acidocaldarius (Tang et al., 2002, Proc Natl Acad Sci USA, 99:7536-7541; Tang et al., 2005, Mol Microbiol, 55:469-481; Lillestol et al., 2006, Archaea, 2:59-72). These studies provided convincing evidence of transcription of entire CRISPR loci from the predicted transcriptional leader sequences that are found at one end of the loci, and of a discrete series of smaller RNAs that correspond in length to multiples of repeat-psiRNA units (e.g. ˜70, 140, 210, 280 nts, etc. (Tang et al., 2002, Proc Natl Acad Sci USA, 99:7536-7541; Tang et al., 2005, Mol Microbiol, 55:469-481)). These findings along with RNA sequence analysis led to a hypothesized biogenesis pathway in which primary CRISPR transcripts are endonucleolytically cleaved within repeat sequences to produce psiRNAs flanked by repeat sequence at both the 5′ and 3′ ends (Tang et al., 2002, Proc Natl Acad Sci USA, 99:7536-7541; Tang et al., 2005, Mol Microbiol, 55:469-481).