This is generally in the field of biologically active nucleic acid molecules, such as EGSs, ribozymes, and antisense RNA, and in the broad field of functional analysis of complex genomes, and more specifically, the use of biologically active nucleic acids (RNAS) to specifically down-modulate the expression of messenger RNAs that encode proteins essential for viability.
Recent advances in automated DNA sequencing technologies and their application to the genomes of multiple organisms have resulted in the accumulation of a vast amount of nucleotide sequence information. At present, the genomic sequences of sixteen bacteria, including several important pathogens, most recently the causative agents of tuberculosis and syphilis, have been determined in their entirety. In addition, the complete sequence of a "simple" eukaryote (Saccharomyces cerevisiae) is known and a similar analysis of the nematode C. elegans will be released shortly. It is likely that many more genomic sequences, both prokaryotic and eukaryotic, will be revealed in the near term.
A major challenge involves the functional analysis of the available and forthcoming genomic information; i.e. determination of the biological role of genes revealed by sequencing. It is particularly important to identify those genes that encode proteins essential for viability. Such proteins are of clear significance in the development of effective chemotherapeutic agents targeted to pathogenic organisms. Currently, there are several strategies available, alone or in combination, for functional genomic analysis, including bioinformatics, expression analysis, and targeted gene disruption. Informatics alone is unlikely to provide definitive new insight into gene function. For example, although E. coli is the best studied organism by far, the genomic sequence revealed that approximately forty percent of the genes were of unknown function. Expression profiling provides primarily inferential information, and targeted gene disruption, although definitive, is labor intensive and time consuming.
Accordingly, it is highly desirable to have a robust, high throughput method that identifies all or most essential genes in a particular organism.
It is therefore an object of the present invention to provide an efficient method and compositions for the identification of genes in bacteria and eukaryotic cells that encode proteins essential for survival.
It is a further object of the present invention to provide methods and compositions for reducing or inactivating expression of such genes.
Ribonucleic acid (RNA) molecules can serve not only as carriers of genetic information, for example, genomic retroviral RNA and messenger RNA (mRNA) molecules and as structures essential for protein synthesis, for example, transfer RNA (tRNA) and ribosomal RNA (rRNA) molecules, but also as enzymes which specifically cleave nucleic acid molecules. Such catalytic RNA molecules are called ribozymes. Although ribozymes theoretically can cleave any desired site in an RNA molecule, in reality not all sites are efficiently cleaved by ribozymes designed to cleave them. This is especially true in vivo where numerous examples have been described of sites that are inefficiently cleaved by targeted ribozymes. The problem is not a total lack of sites in an RNA molecule of interest, but rather determining which sites, among the many possible sites, can be cleaved most efficiently. This is important since it is often desirable to identify the most efficient sites of cleavage and not just any site that can be cleaved. The process of targeting one or a few sites on an RNA molecule essentially at random and then testing for cleavage is not likely to identify the most efficient sites. Comprehensive testing of all sites is not practical because of the amount of labor involved in making and testing each ribozyme or external guide sequence ("EGS"). WO 96/21731 by Innovir describes selection of efficiently cleaved sites in this manner by making and testing 80 different EGSs targeted to different sites. However, this represented only a fraction of the possible sites. Techniques using similar labor intensive methods for identifying sites that are accessible for cleavage are described in U.S. Pat. Nos. 5,525,468 and 5,496,698.
Kawasaki et al., Nucl. Acids Res. 24(15):3010-3016 (1996), describes the use of a transcript encoding a fusion between adenovirus E1A-associated 300 kDa protein (p300) and luciferase to assess the efficiency with which sites in the p300 RNA are cleaved by hammerhead ribozymes in vivo. A few hammerhead ribozymes targeted to sites having GUX triplets (which are required for cleavage by a hammerhead ribozyme) were designed and expressed from a vector in cells. A separate vector expressed the p300-luciferase fusion RNA. Cleavage of sites in the p300 portion of the transcript was assessed by measuring luciferase activity. Kawasaki et al. tested each ribozyme separately and therefore their method also does not solve the need for a rapid, efficient selection process.
As an alternative to actually testing for individual cleavable sites, or preliminary to such testing, attempts have also been made to predict which sites will be accessible from theoretical considerations or by empirically testing the presence or absence of secondary or tertiary structure at sites in RNA molecules. For example, Ruffner et al., Biochemistry 29:10695-10702 (1990), Zoumadakis and Tabler, Nucl. Acids Res. 23:1192-1196 (1995), Shimayama et al., Biochemistry 34:3649-3654 (1995), Haseloff and Gerlach, Nature 334:585-591 (1988), and Lieber and Strauss, Mol. Cell. Biol. 8:466-472 (1995), describe attempts to use rules of structure formation in RNA to predict cleavable sites. However, the structure of RNA molecules cannot be accurately predicted from theoretical considerations and the determination of actual secondary and tertiary structure of an RNA molecule requires extensive experimentation. It can also be difficult to identify ribozymes and other biologically active molecules that will function inside cells since not all such biologically active molecules that are functional in vitro are functional in cells because they are, for example, improperly localized, sequestered, or bound by intracellular proteins.
It is therefore an object of the present invention to provide a method and compositions for identifying biologically active RNA molecules, such as ribozymes, EGSs for ribozymes, and antisense RNA, that alter expression of an RNA molecule efficiently in vivo.
It is a further object of the present invention to provide a method and compositions for identifying sites in an RNA, or nucleotide molecules involved in expression of a target RNA, that are most accessible as target sites for alteration of expression in vivo.
It is a further object of the present invention to provide functional oligonucleotide molecules directed to sites identified as accessible.