Small interfering RNAs (siRNAs) are double stranded RNAs that specifically destroy any RNA within a cell containing a matching sequence. In this manner, siRNAs are effective suppressors of various genes, including oncogenes and tumor suppressor genes via a phenomenon known as RNA interference (RNAi). RNA interference disrupts gene expression via a cellular system utilizing double-stranded RNAs. Recognition of this phenomenon was initially identified in Caenorhabditis elegans (see, e.g., Fire et al., Nature 391, 806-811 (1998)). More recently, 21 or 22 nucleotide double stranded RNAs with 2-nucleotide 3′ overhangs have been reported to show RNAi activity in mammalian cells (see, e.g., Elbashir et al., Nature 411, 494-98 (2001) and Caplen et al., Proc. Natl. Acad. Sci. USA 98, 9742-47 (2001)). Miyagishi et al. described the construction of a siRNA expression vector employing two U6 RNA Polymerase III promoters separately driving transcription of either a sense or an antisense version of a short DNA sequence. After transcription, a siRNA was derived from duplex formation between the sense and antisense strands of the RNA transcripts. (see Miyagishi et al., Nature Biotech. 19, 497-500 (2002)). Brummelkamp et al. constructed a mammalian expression vector using a polymerase-III H1-RNA gene promoter linked to a gene-specific insert of 19 nucleotides (sense) separated by a short spacer sequence from the antisense sequence of the same 19 nucleotide gene-specific insert. The complementary nature of the resulting RNA caused the transcripts to form a 19-base pair stem-loop structure (see, Brummelkamp et al., Science, 296, 550-553 (2002)).
RNAi could serve as a powerful tool for functional genomics. For example, Kamath et al. described an interfering RNA system for attempting phenotypic identification of genes in C. elegans (see, Kamath et al., Nature 421, 231-237 (2003)). Their particular approach used double stranded RNA sequences hundreds of base pairs long. This approach, however, would not be suitable for application in higher organisms, such as mammalian systems, in which double-stranded RNA's over 30 base pairs in length induce a host defense response via the interferon pathway which nonspecifically inhibits mRNA and protein translation. Because the system described by Kamath et al. used substantially longer double-stranded RNA, it would not be suitable for use in mammalian cells. Additionally, the system of Kamath et al. was not generated from random combinations of nucleotides and thus would not be broadly applicable for functional genomics applications.
For functional genomics of higher organisms, a set of short double-stranded RNAs, 12-25 nucleotides in length would be useful. Double-stranded RNAs of this length do not induce a host defense response. Optimally, functional genomics requires a large set of siRNAs, each with a unique sequence such that all the genes of an organism are statistically represented by at least one siRNA. Such a “siRNA library” (i.e., comprising a random or semirandom set of siRNAs) could facilitate the identification of specific genes or viruses responsible for resulting phenotypes in higher organisms, such as mammalian cells.
The desirability of siRNA libraries for drug discovery and disease modification has prompted two recent approaches to library construction. The first “brute force” approach uses traditional methods to generate siRNA sequences directed against one gene at a time and pools these to form an siRNA library (Boutros, M, et al, Science, 303(5659):832-35. (2004), Paddison, P J et al, Nature; 428(6981):427-31 (2004)). The second approach derives siRNA libraries from pooled cDNA's by using the enzyme Mmel to digest cDNAs into 20-bp templates for siRNA (Sen et al., Nat Genet.; 36(2):183-89 (2004); Shirane et al., Nat Genet.; 36(2): 190-96 (2004)).
The “brute force” method of generating siRNA libraries has a number of limitations. There is a tremendous workload involved in this approach, and such libraries have only been reported for a few organisms. In addition, there are numerous gaps expected in both “brute-force” and cDNA-derived libraries for several reasons. Firstly, such libraries attenuate only coding RNA. It should be noted that coding RNAs used in these approaches comprise roughly 4% of the genome. It is increasingly recognized that noncoding RNAs play important functional roles in the cell, including regulation of pathways and sequestration of proteins. For example, the noncoding RNA SCA8 has been shown to cause neurodegeneration (Mutsuddi, M. et. al., Curr. Biol., 14, 302-08 (2004)). All functional noncoding RNAs will be missed both by directed (“brute-strength”) libraries and libraries derived from cDNAs. Secondly, siRNAs can trigger heterochromatin formation of specific areas of genomic DNA, resulting in promoter silencing and possibly other effects (Volpe et al., Science; 297(5588):1833-37 (2002)). Any such genomic effects, which could be quite durable, will be missed both by directed (“brute-strength”) libraries and libraries derived from cDNAs. Thirdly, directed (“brute-strength”) libraries and libraries derived from cDNAs are not well-suited to attenuate genes of tissues and organisms besides those they were designed for. Thus, for example, such libraries will be of little use in attenuating genes of emerging and unknown viruses. Fourthly, an additional limitation of brute-force cloning to generate siRNA libraries is that this approach is likely to collect sequences that are over 75% suppressive. Some phenotypes shift when a protein is attenuated by 50% and shift again when the protein is even more seriously suppressed (see, e.g., Muraoka R S Mol Cell Biol. 2002 April; 22(7):2204-19.). A fifth limitation of these approaches is that neither are very efficient. For example, the brute force approach is extremely time intensive and will contain holes related to splice variants. Key physiologic genes (e.g., p53, p73, Cyclins) are expressed in many spliced forms that vary in their function. Some spliced forms have opposing effects to each other. Interfering RNA which targets all splice variants of a gene may not have a large effect if it decreases both positive and negative acting splice forms. A sixth major drawback to cDNA-based libraries is that they contain vast overrepresentation of siRNA or shRNAs directed against common messages, whereas many of the most interesting target genes (e.g. transcriptional regulators, phosphatases) are expressed at low levels.
Each of the limitations of directed “brute-strength” libraries and cDNA-based libraries could be addressed by a library that is comprised of random siRNA sequences. However, the synthesis of random siRNAs is not well established, and the construction of a random siRNA library heretofore has not been demonstrated. Frustrating the creation of a siRNA library has been the difficulty in joining random DNA oligomers with unknown sequences to their exact complementary sequences wherein the production of such constructs is sufficient to cover all combinations of nucleotides in oligomers of a desired length. The necessary formation of double stranded RNA duplexes from the template DNA oligomers is an additional hurdle to overcome.
Miyagashi et al. (Nature Biotech., 19, 497-500 (2002)) speculated that an opposing U6 promoter system may allow the production of randomized siRNA libraries. However, the Miyagashi report does not disclose the construction of any such library, nor does it disclose any additional technical features or provide any instruction concerning the construction of such a library.
United States Patent Application Publication 20040005593 (Lorens) speculates that random libraries of interfering RNA molecules may be constructed by synthesizing a pool of oligonucleotides comprising a restriction site, a randomized siRNA sequence, a complementarity region sequence, and a hairpin-forming linker sequence (optionally a U-turn motif, a ribozyme and/or or a two complementary sequences that form a hairpin or stem loop structure). According to the published Lorens application, the oligonucleotides will adopt a hairpin structure, which will function as a substrate for a DNA polymerase, facilitating the synthesis of a complement sequence of the randomized siRNA sequence. According to the published Lorens application, the hairpin structure is then denatured and hybridized to a primer at the 3′ end allowing the conversion of the total sequence to double stranded DNA by a DNA polymerase. The double stranded oligonucleotides, supposedly encoding a random assortment of siRNA sequences, then are cloned into a retroviral vector to generate an siRNA-expression vector library. The published Lorens application, however, does not disclose the construction of any random siRNA library. Moreover, the approach proposed by Lorens would not result in a random library. A technical hurdle to the successful application of the Lorens approach is that the hairpin structure has great homology to itself and will tend to self-anneal. Thus, it will be necessary to use an amount of primer in great excess of the template to facilitate complementary strand priming, which is not taught by Lorens. Another consequence of the high self-homology of the hairpin structure is that self-annealing actually leads to a non-random library if the approach disclosed by Lorens is followed. Self-annealing of the hairpin structure will prevent polymerization of the complementary strand. Unless such polymerization occurs under conditions that prevent self-annealing, regions rich in GC content are more likely to self-anneal than are regions rich in AT content; hence, GC-rich sequences will be selected against in the process. Lorens calls for denaturation of the hairpin prior to polymerization (which is necessary in any event to permit binding of the primer); however, Lorens does not specify that the polymerization occur under conditions that maintains the denaturation of the hairpin structure. Accordingly, the distribution of sequences in the library produced by the Lorens approach will be non-random with respect to GC-rich vis-à-vis non-GC-rich sequences.
In light of the deficiencies with existing technology, new synthetic methods are needed that can produce random siRNA libraries (including siRNA- and shRNA-encoding libraries). Such methods and the resulting libraries would facilitate a multitude of tasks such as, identification of genes of interest, analysis of gene function and identification of therapeutically useful siRNA sequences.