The present invention relates to the quantification of target nucleotide sequences in a wide variety of nucleic acid samples and more specifically to the methods employing the design and use of oligonucleotide probes that are useful for detecting and quantifying target nucleotide sequences, especially RNA target sequences, such as microRNA and siRNA target sequences of interest and for detecting differences between nucleic acid samples (e.g., such as samples from a cancer patient and a healthy patient).
MicroRNAs
The expanding inventory of international sequence databases and the concomitant sequencing of nearly 200 genomes representing all three domains of life—bacteria, archea and eukaryota—have been the primary drivers in the process of deconstructing living organisms into comprehensive molecular catalogs of genes, transcripts and proteins. The importance of the genetic variation within a single species has become apparent, extending beyond the completion of genetic blueprints of several important genomes, culminating in the publication of the working draft of the human genome sequence in 2001 (Lander, Linton, Birren et al., 2001 Nature 409: 860-921; Venter, Adams, Myers et al., 2001 Science 291: 1304-1351; Sachidanandam, Weissman, Schmidt et al., 2001 Nature 409: 928-933). On the other hand, the increasing number of detailed, large-scale molecular analyses of transcription originating from the human and mouse genomes along with the recent identification of several types of non-protein-coding RNAs, such as small nucleolar RNAs, siRNAs, microRNAs and antisense RNAs, indicate that the transcriptomes of higher eukaryotes are much more complex than originally anticipated (Wong et al. 2001, Genome Research 11: 1975-1977; Kampa et al. 2004, Genome Research 14: 331-342).
As a result of the Central Dogma: ‘DNA makes RNA, and RNA makes protein’, RNAs have been considered as simple molecules that just translate the genetic information into protein. Recently, it has been estimated that although most of the genome is transcribed, almost 97% of the genome does not encode proteins in higher eukaryotes, but putative, non-coding RNAs (Wong et al. 2001, Genome Research 11: 1975-1977). The non-coding RNAs (ncRNAs) appear to be particularly well suited for regulatory roles that require highly specific nucleic acid recognition. Therefore, the view of RNA is rapidly changing from the merely informational molecule to comprise a wide variety of structural, informational and catalytic molecules in the cell.
Recently, a large number of small non-coding RNA genes have been identified and designated as microRNAs (miRNAs) (for review, see Ke et al. 2003, Curr. Opin. Chem. Biol. 7:516-523). The first miRNAs to be discovered were the lin-4 and let-7 that are heterochronic switching genes essential for the normal temporal control of diverse developmental events (Lee et al. 1993, Cell 75:843-854; Reinhart et al. 2000, Nature 403: 901-906) in the roundworm C. elegans. miRNAs have been evolutionarily conserved over a wide range of species and exhibit diversity in expression profiles, suggesting that they occupy a wide variety of regulatory functions and exert significant effects on cell growth and development (Ke et al. 2003, Curr. Opin. Chem. Biol. 7:516-523). Recent work has shown that miRNAs can regulate gene expression at many levels, representing a novel gene regulatory mechanism and supporting the idea that RNA is capable of performing similar regulatory roles as proteins. Understanding this RNA-based regulation will help us to understand the complexity of the genome in higher eukaryotes as well as understand the complex gene regulatory networks.
miRNAs are 21-25 nucleotide (nt) RNAs that are processed from longer endogenous hairpin transcripts (Ambros et al. 2003, RNA 9: 277-279). To date more than 719 microRNAs have been identified in humans, worms, fruit flies and plants according to the miRNA registry database hosted by Sanger Institute, UK, and many miRNAs that correspond to putative genes have also been identified. Some miRNAs have multiple loci in the genome (Reinhart et al. 2002, Genes Dev. 16: 1616-1626) and occasionally, several miRNA genes are arranged in tandem clusters (Lagos-Quintana et al. 2001, Science 294: 853-858). The fact that many of the miRNAs reported to date have been isolated just once suggests that many new miRNAs will be discovered in the future. A recent in-depth transcriptional analysis of the human chromosomes 21 and 22 found that 49% of the observed transcription was outside of any known annotation, and furthermore, that these novel transcripts were both coding and non-coding RNAs (Kampa et al. 2004, Genome Research 14: 331-342). The identified miRNAs to date represent most likely the tip of the iceberg, and the number of miRNAs might turn out to be very large.
The combined characteristics of microRNAs characterized to date (Ke et al. 2003, Curr. Opin. Chem. Biol. 7:516-523; Lee et al. 1993, Cell 75:843-854; Reinhart et al. 2000, Nature 403: 901-906) can be summarized as:    1. miRNAs are single-stranded RNAs of about 21-25 nt.    2. They are cleaved from a longer endogenous double-stranded hairpin precursor by the enzyme Dicer.    3. miRNAs match precisely the genomic regions that can potentially encode precursor RNAs in the form of double-stranded hairpins.    4. miRNAs and their predicted precursor secondary structures are phylogenetically conserved.
Several lines of evidence suggest that the enzymes Dicer and Argonaute are crucial participants in miRNA biosynthesis, maturation and function (Grishok et al. 2001, Cell 106: 23-24). Mutations in genes required for miRNA biosynthesis lead to genetic developmental defects, which are, at least in part, derived from the role of generating miRNAs. The current view is that miRNAs are cleaved by Dicer from the hairpin precursor in the form of duplex, initially with 2 or 3 nt overhangs in the 3′ ends, and are termed pre-miRNAs. Cofactors join the pre-miRNP and unwind the pre-miRNAs into single-stranded miRNAs, and pre-miRNP is then transformed to miRNP. miRNAs can recognize regulatory targets while part of the miRNP complex. There are several similarities between miRNP and the RNA-induced silencing complex, RISC, including similar sizes and both containing RNA helicase and the PPD proteins. It has therefore been proposed that miRNP and RISC are the same RNP with multiple functions (Ke et al., 2003, Curr. Opin. Chem. Biol. 7:516-523). Different effectors direct miRNAs into diverse pathways. The structure of pre-miRNAs is consistent with the observation that 22 nt RNA duplexes with 2 or 3 nt overhangs at the 3′ ends are beneficial for reconstitution of the protein complex and might be required for high affinity binding of the short RNA duplex to the protein components (for review, see Ke et al., 2003, Curr. Opin. Chem. Biol. 7:516-523).
Growing evidence suggests that miRNAs play crucial roles in eukaryotic gene regulation. The first miRNAs genes to be discovered, lin-4 and let-7, base-pair incompletely to repeated elements in the 3′ untranslated regions (UTRs) of other heterochronic genes, and regulate the translation directly and negatively by antisense RNA-RNA interaction (Lee et al. 1993, Cell 75:843-854; Reinhart et al., 2000, Nature 403: 901-906). Other miRNAs are thought to interact with target mRNAs by limited complementary and suppressed translation as well (Lagos-Quintana et al., 2001, Science 294: 853-858; Lee and Ambros 2001, Science 294: 858-862). Many studies have shown, however, that given a perfect complementarity between miRNAs and their target RNA, could lead to target RNA degradation rather than inhibit translation (Hutvanger and Zamore 2002, Science 297: 2056-2060), suggesting that the degree of complementarity determines their functions. By identifying sequences with near complementarity, several targets have been predicted, most of which appear to be potential transcriptional factors that are crucial in cell growth and development. The high percentage of predicted miRNA targets acting as developmental regulators and the conservation of target sites suggest that miRNAs are involved in a wide range of organism development and behaviour and cell fate decisions (for review, see Ke et al. 2003, Curr. Opin. Chem. Biol. 7:516-523).
MicroRNAs and Human Disease
Analysis of the genomic location of miRNAs indicates that they play important roles in human development and disease. Several human diseases have already been pinpointed in which miRNAs or their processing machinery might be implicated. One of them is spinal muscular atrophy (SMA), a paediatric neurodegenerative disease caused by reduced protein levels or loss-of-function mutations of the survival of motor neurons (SMN) gene (Paushkin et al. 2002, Curr. Opin. Cell Biol. 14: 305-312). Two proteins (Gemin3 and Gemin4) that are part of the SMN complex are also components of miRNPs, whereas it remains to be seen whether miRNA biogenesis or function is dysregulated in SMA and what effect this has on pathogenesis. Another neurological disease linked to mi/siRNAs is fragile X mental retardation (FXMR) caused by absence or mutations of the fragile X mental retardation protein (FMRP) (Nelson et al. 2003, TIBS 28: 534-540), and there are additional clues that miRNAs might play a role in other neurological diseases. Yet another interesting finding is that the miR-224 gene locus lies within the minimal candidate region of two different neurological diseases: early-onset Parkinsonism and X-linked mental retardation (Dostie et al. 2003, RNA: 9: 180-186). Links between cancer and miRNAs have also been recently described. The most frequent single genetic abnormality in chronic lymphocytic leukaemia (CLL) is a deletion localized to chromosome 13q14 (50% of the cases). A recent study determined that two different miRNA (miR15 and miR16) genes are clustered and located within the intron of LEU2, which lies within the deleted minimal region of the B-cell chronic lymphocytic leukaemia (B-CLL) tumour suppressor locus, and both genes are deleted or down-regulated in the majority of CLL cases (Calin et al. 2002, Proc. Natl. Acad. Sci. U.S.A. 99: 15524-15529). It has been anticipated that connections between miRNAs and human diseases will only strengthen in parallel with the knowledge of miRNAs and the gene networks that they control. Moreover, the understanding of the regulation of RNA-mediated gene expression is leading to the development of novel therapeutic approaches that will be likely to revolutionize the practice of medicine (Nelson at al. 2003, TIBS 28: 534-540).
Small Interfering RNAs and RNAi
Some of the recent attention paid to small RNAs in the size range of 21 to 25 nt is due to the phenomenon RNA interference (RNAi), in which double-stranded RNA leads to the degradation of any RNA that is homologous (Fire et al. 1998, Nature 391: 806-811). RNAi relies on a complex and ancient cellular mechanism that has probably evolved for protection against viral attack and mobile genetic elements. A crucial step in the RNAi mechanism is the generation of short interfering RNAs (siRNAs), double-stranded RNAs that are about 22 nt long each. The siRNAs lead to the degradation of homologous target RNA and the production of more siRNAs against the same target RNA (Lipardi et al. 2001, Cell 107: 297-307). The present view for the mRNA degradation pathway of RNAi is that antiparallel Dicer dimers cleave long double-stranded dsRNAs to form siRNAs in an ATP-dependent manner. The siRNAs are then incorporated in the RNA-induced silencing complex (RISC) and ATP-dependent unwinding of the siRNAs activates RISC (Zhang et al. 2002, EMBO J. 21: 5875-5885; Nykänen et al. 2001, Cell 107: 309-321). The active RISC complex is thus guided to degrade the specific target mRNAs.
Detection and Analysis of microRNAs and siRNAs
The current view that miRNAs may represent a newly discovered, hidden layer of gene regulation has resulted in high interest among researchers around the world in the discovery of miRNAs, their targets and mechanism of action. Detection and analysis of these small RNAs is, however not trivial. Thus, the discovery of more than 700 miRNAs to date has required taking advantage of their special features. First, the research groups have used the small size of the miRNAs as a primary criterion for isolation and detection. Consequently, standard cDNA libraries would lack miRNAs, primarily because RNAs that small are normally excluded by six selection in the cDNA library construction procedure. Total RNA from fly embryos, worms or HeLa cells have been size fractionated so that only molecules 25 nucleotides or smaller would be captured (Moss 2002, Curr. Biology 12: R138-R140). Synthetic oligomers have then been ligated directly to the RNA pools using T4 RNA ligase. Then the sequences have been reverse-transcribed, amplified by PCR, cloned and sequenced (Moss 2002, Curr. Biology 12: R138-R140). The genome databases have subsequently been queried with the sequences, confirming the origin of the miRNAs from these organisms as well as placing the miRNA genes physically in the context of other genes in the genome. The vast majority of the cloned sequences have been located in intronic regions or between genes, occasionally in clusters, suggesting that the tandemly arranged miRNAs are processed from a single transcript to allow coordinate regulation. Furthermore, the genomic sequences have revealed the fold-back structures of the miRNA precursors (Moss 2002, Curr. Biology 12: R138-R140).
The size and sometimes low level of expression of different miRNAs require the use of sensitive and quantitative analysis tools. Due to their small size of 21-25 nt, the use of quantitative real-time PCR for monitoring expression of mature miRNAs is excluded. Therefore, most miRNA researchers currently use Northern blot analysis combined with polyacrylamide gels to examine expression of both the mature and pre-miRNAs (Reinhart et al. 2000, Nature 403: 901-906; Lagos-Quintana et al. 2001, Science 294: 853-858; Lee and Ambros 2001, Science 294: 862-864). Primer extension has also been used to detect the mature miRNA (Zeng and Cullen 2003, RNA 9: 112-123). The disadvantage of all the gel-based assays (Northern blotting, primer extension, RNase protection assays etc.) as tools for monitoring miRNA expression includes low throughput and poor sensitivity. DNA microarrays would appear to be a good alternative to Northern blot analysis to quantify miRNAs since microarrays have excellent throughput. However, the drawbacks of microarrays are the requirement of high concentrations of input target for efficient hybridization and signal generation, poor sensitivity for rare targets, and the necessity for post-array validation using more sensitive assays such as real-time quantitative PCR, which is not feasible. A recent report used cDNA microarrays to monitor the expression of miRNAs during neuronal development with 5 to 10 μg aliquot of input total RNA as target, but the mature miRNAs had to be separated from the miRNA precursors using micro concentrators prior to microarray hybridizations (Krichevsky et al. 2003, RNA 9: 1274-1281). A PCR approach has also been used to determine the expression levels of mature miRNAs (Grad at al. 2003, Mol. Cell. 11: 1253-1263). This method is useful to clone miRNAs, but highly impractical for routine miRNA expression profiling, since it involves gel isolation of small RNAs and ligation to linker oligonucleotides. Schmittgen et al. (2004, Nucleic Acids Res. 32: e43) describe an alternative method to Northern blot analysis, in which they use real-time PCR assays to quantify the expression of miRNA precursors. The disadvantage of this method is that it only allows quantification of the precursor miRNAs, which does not necessarily reflect the expression levels of mature miRNAs. In order to fully characterize the expression of large numbers of miRNAs, it is necessary to quantify the mature miRNAs, such as those expressed in human disease, where alterations in miRNA biogenesis produce levels of mature miRNAs that are very different from those of the precursor miRNA. For example, the precursors of 26 miRNAs were equally expressed in non-cancerous and cancerous colorectal tissues from patients, whereas the expression of mature human miR143 and miR145 was greatly reduced in cancer tissues compared with non-cancer tissues, suggesting altered processing for specific miRNAs in human disease (Michael et al. 2003, Mol. Cancer. Res. 1: 882-891). On the other hand, recent findings in maize with miR166 and miR165 in Arabidopsis thaliana, indicate that microRNAs act as signals to specify leaf polarity in plants and may even form movable signals that emanate from a signalling centre below the incipient leaf (Juarez et al. 2004, Nature 428: 84-88; Kidner and Martienssen 2004, Nature 428: 81-84).
In conclusion, the biggest challenge in measuring the mature miRNAs as well as siRNAs using real-time quantitative PCR is their small size of the of 21-25 nt. The described method of invention addresses the aforementioned practical problems in detection and quantification of small RNA molecules, miRNAs as well as siRNAs, and aims at ensuring the development of flexible, convenient and inexpensive assays for accurate and specific quantification of miRNA and siRNA transcripts.
RNA Editing and Alternative Splicing
RNA editing is used to describe any specific change in the primary sequence of an RNA molecule, excluding other mechanistically defined processes such as alternative splicing or polyadenylation. RNA alterations due to editing fall into two broad categories, depending on whether the change happens at the base or nucleotide level (Gott 2003, C. R. Biologies 326 901-908). RNA editing is quite widespread, occurring in mammals, viruses, marsupials, plants, flies, frogs, worms, squid, fungi, slime molds, dinoflagellates, kinetoplastid protozoa, and other unicellular eukaryotes. The current list most likely represents only the tip of the iceberg; based on the distribution of homologues of known editing enzymes, as RNA editing almost certainly occurs in many other species, including all metazoa. Since RNA editing can be regulated in a developmental or tissue-specific manner, it is likely to play a significant role in the etiology of human disease (Gott 2003, C. R. Biologies 326 901-908).
A common feature for eukaryotic genes is that they are composed of protein-encoding exons and introns. Introns are characterized by being excised from the pre-mRNA molecule in RNA splicing, as the sequences on each side of the intron are spliced together. RNA splicing not only provides functional mRNA, but is also responsible for generating additional diversity. This phenomenon is called alternative splicing, which results in the production of different mRNAs from the same gene. The mRNAs that represent isoforms arising from a single gene can differ by the use of alternative exons or retention of an intron that disrupts two exons. This process often leads to different protein products that may have related or drastically different, even antagonistic, cellular functions. There is increasing evidence indicating that alternative splicing is very widespread (Croft et al. Nature Genetics, 2000). Recent studies have revealed that at least 80% of the roughly 35,000 genes in the human genome are alternatively spliced (Kampa et al. 2004, Genome Research 14: 331-342). Clearly, by combining different types of modifications and thus generating different possible combinations of transcripts of different genes, alternative splicing together with RNA editing are potent mechanisms for generating protein diversity. Analysis of the alternative splice variants and RNA editing, in turn, represents a novel approach to functional genomics, disease diagnostics and pharmacogenomics.
Misplaced Control of Alternative Splicing as a Causative Agent for Human Disease
The detection of the detailed structure of the transcriptional output is an important goal for molecular characterization of a cell or tissue. Without the ability to detect and quantify the splice variants present in one tissue, the transcript content or the protein content cannot be described accurately. Molecular medical research shows that many cancers result in altered levels of splice variants, so an accurate method to detect and quantify these transcripts is required. Mutations that produce an aberrant splice form can also be the primary cause of such severe diseases such as spinal muscular dystrophy and cystic fibrosis.
Much of the study of human disease, indeed much of genetics is based upon the study of a few model organisms. The evolutionary stability of alternative splicing patterns and the degree to which splicing changes according to mutations and environmental and cellular conditions influence the relevance of these model systems. At present, there is little understanding of the rates at which alternative splicing patterns or RNA editing change, and the factors influencing these rates.
Previously, other analysis methods have been performed with the aim of detecting either splicing of RNA transcripts per se in yeast, or of detecting putative exon skipping splicing events in rat tissues, but neither of these approaches had sufficient resolution to estimate quantities of splice variants, a factor that could be essential to an understanding of the changes in cell life cycle and disease. Thus, improved methods are needed for nucleic acid amplification, hybridization, and quantification. The present method of invention enables to distinguish between mRNA splice variants as well as RNA-edited transcripts and quantify the amount of each variant in a nucleic acid sample, such as a sample derived from a patient.
Antisense Transcription in Eukaryotes
RNA-mediated gene regulation is widespread in higher eukaryotes and complex genetic phenomena like RNA interference, co-suppression, transgene silencing, imprinting, methylation, and possibly position-effect variegation and transvection, all involve intersecting pathways based on or connected to RNA signalling (Mattick 2001; EMBO reports 2, 11: 986-991). Recent studies indicate that antisense transcription is a very common phenomenon in the mouse and human genomes (Okazaki et al. 2002; Nature 420: 563-573; Yelin et al., 2003, Nature Biotechnol.). Thus, antisense modulation of gene expression in eukaryotic cells, e.g. human cells appear to be a common regulatory mechanism. In light of this, the present invention provides a method for quantification of non-coding antisense RNAs, as well as a method for highly accurate mapping of the overlapping regions between sense-antisense transcriptional units.