The invention relates to nucleic acids and methods for expression profiling of mRNAs, identifying and profiling of particular mRNA splice variants, and detecting mutations, deletions, or duplications of particular exons, e.g., alterations associated with a disease such as cancer, in a nucleic acid sample, e.g., a patient sample. The invention furthermore relates to methods for detecting nucleic acids by fluorescence in situ hybridization.
The field of the invention is oligonucleotides (e.g., oligonucleotide arrays) that are useful for detecting nucleic acids of interest and for detecting differences between nucleic acid samples (e.g, such as samples from a cancer patient and a healthy patient).
DNA chip technology utilizes miniaturized arrays of DNA molecules immobilized on solid surfaces for biochemical analyses. The power of DNA microarrays as experimental tools relies on the specific molecular recognition via complementary base-pairing, which makes them highly useful for massive parallel analyses. In the post-genomic era, microarray technology has thus become the method of choice for many hybridization-based assays, such as expression profiling, SNP detection, DNA re-sequencing, and genotyping on a genomic scale.
Expression microarrays are capable of profiling gene expression patterns of tens of thousands of genes in a single experiment. Hence, this technology provides a powerful tool for deciphering complex biological systems, and thereby greatly facilitates research in basic biology and living processes, as well as disease diagnostics, theranostics, and drug development. In a typical cell, the mRNAs are distributed in three frequency classes: (i) superprevalent (10-20% of the total mRNA mass); (ii) intermediate (40-45%); and (iii) low-abundant mRNAs (40-45%). It is therefore of utmost importance that the dynamic range and sensitivity of the expression arrays are optimal, especially when analyzing expression levels of messages or mRNA splice variants belonging to the low-abundant class.
The recent explosion of interest in DNA microarray technology has been sparked by two key innovations. The first was the use of non-porous solid support, such as glass or polymer as opposed to nylon or nitrocellulose filters, which has facilitated miniaturization and fluorescence-based detection. Roughly 20,000 cDNAs can be robotically spotted onto a microscope slide and hybridized with a double-labeled probe. The second was the development of methods for high-density spatial synthesis of oligonucleotides. The two key array technologies are outlined in the following.
Oligonucleotide Arrays
An efficient strategy for oligonucleotide microarray manufacturing involves DNA synthesis on solid surfaces using combinatorial chemistry. Most of the current technology is developed by Affymetrix and Rosetta Inpharmatics. Glass is currently preferred as the synthesis support because of its inert chemical properties and low level of intrinsic fluorescence as well as the ability to chemically derivatize the surface. Of the three approaches currently used to manufacture oligonucleotide arrays, the light-directed deprotection method is the most effective one in generating high density microchips. A single round of synthesis involves light-directed deprotection, followed by nucleotide coupling. Photolithographic masking is used to control the regions of the chip designated for illumination. Affymetrix uses a combination of photolithography and combinatorial chemistry to manufacture its GeneChip Arrays. Using technologies adapted from the semiconductor industry, GeneChip manufacturing begins with a 5-inch square quartz wafer. Initially the quartz is washed to ensure uniform hydroxylation across its surface. The wafer is placed in a bath of silane, which reacts with the hydroxyl groups of the quartz and forms a matrix of covalently linked molecules. The distance between these silane molecules determines the probes' packing density, allowing arrays to hold over 500,000 features within 1.28 square centimeters. The principal disadvantage of this method is that a significant amount of chip design work and cost is associated with the mask design. Once a set of masks has been made, a large number of chips can be produced at a reasonable cost. The current pricing of oligonucleotide arrays available from Affymetrix are in the range of 5-10 fold more expensive than cDNA microarrays.
DNA-DNA hybridization using oligonucleotide chips is clearly different from that of cDNA microarrays. Hybridizations involving oligos are much more sensitive to the GC content of individual heteroduplexes. In addition, single base mismatches have a pronounced effect on the hybridization reassociation of short oligos, and point mutations can thus be readily detected using oligo chips.
cDNA Microarrays
cDNA microarrays containing large DNA segments such as cDNAs are generated by physically depositing small amounts of each DNA of interest onto known locations on glass surfaces. Two technologies for printing microarrays are (1) mechanical microspotting, and (2) ink-jetting. Mechanical microspotting has been extensively used at, e.g., Stanford University, and it utilizes pins or capillaries to deposit small quantities of DNA onto known addresses using motion control systems. Recent advances in microspotting technology using modern arraying robots allow for the preparation of 100 microarrays containing over 10,000 features in less than 12 hours. A DNA arrayer is relatively easy to set up, and the cost is usually low compared to on-chip oligoarrayers. cDNA microarrays are capable of profiling gene expression patterns of tens of thousands of genes in a single experiment. To compare the relative abundance of the arrayed gene sequences in two DNA or RNA samples, e.g., the total mRNA isolated from two different cell populations, the two samples are first labeled using two different fluorescent dyes such as Cy-3 and Cy-5. The labeled samples are mixed and hybridized to the clones on the array slide. After the hybridization, laser excitation of the incorporated, fluorescent target molecules yields an emission with a characteristic spectra, which is measured with a confocal laser scanner. The monochrome images from the scanner are imported to the software in which the images are pseudo-colored and merged. Data from a single hybridization is viewed as a normalized ratio in which significant deviations from the ratio are indicative of either increased or decreased expression levels relative to the reference sample. Data from multiple experiments can be examined using any number of data mining tools.
Current Status of Array Technology
It has now become clear that cDNA microarrays, originally developed by Pat Brown and co-workers at the Stanford University, are sensitive, but may not be sufficiently specific with respect to, e.g., discrimination of homologous transcripts in gene families and alternatively spliced isoforms. On the other hand, the Affymetrix GeneChip system is specific, but may not be sensitive enough. This lack of sensitivity may explain why Affymetrix uses 16×26-mer perfect match capture probes together with 16×25-mer mismatch probes per transcript in its expression profiling chips resulting in enormous data sets in genome-wide arrays. Therefore, the functional genomics field is in the process of switching, as they run out of samples, from existing PCR-amplified cDNA fragment libraries for microarraying to custom longmer oligonucleotide arrays comprising transcript-specific oligonucleotide capture probes typically in the range of 30-mers to 80-mers, thus addressing both specificity and sensitivity.
Alternative Splicing
As the field of genomics research is shifting from the acquisition of genome sequences to high-throughput functional genomics, there is an increasing need to understand the dynamics within the genetic regulation as well as RNA and protein sequences in order to elucidate gene expression in all its complexity. A common feature for eukaryotic genes is that they are composed of protein-encoding exons and introns. Introns (intra-genic-regions) are non-coding DNA which interrupt the exons. Introns are characterized by being excised from the pre-mRNA molecule in RNA splicing, as the sequences on each side of the intron are spliced together. RNA splicing not only provides functional mRNA, but is also responsible for generating additional diversity. This phenomenon is called alternative splicing, which results in the production of different mRNAs from the same gene. The mRNAs that represent isoforms arising from a single gene can differ by the use of alternative exons or retention of an intron that disrupts two exons. This process often leads to different protein products that may have related or drastically different, even antagonistic, cellular functions. There is increasing evidence indicating that alternative splicing is very widespread (Croft et al. Nature Genetics, 2000). Recent studies have revealed that at least 60% of the roughly 35,000 genes in the human genome are alternatively spliced. Clearly, by combining different types of modifications and thus generating different possible combinations of transcripts of different genes, alternative splicing is a potent mechanism for generating protein diversity. Analysis of the spliceome, in turn, represents a novel approach to both functional genomics and pharmacogenomics.
Antisense Transcription in Eukaryotes
RNA-mediated gene regulation is widespread in higher eukaryotes and complex genetic phenomena like RNA interference, co-suppression, transgene silencing, imprinting, methylation, and possibly position-effect variegation and transvection, all involve intersecting pathways based on or connected to RNA signalling (Mattick 2001; EMBO reports 2, 11: 986-991). Recent studies indicate that antisense transcription is a very common phenomenon in the mouse and human genomes (Okazaki et al. 2002; Nature 420: 563-573; Yelin et al. 2003, Nature Biotechnol.). Thus, antisense modulation of gene expression in e.g. human cells may be a common regulatory mechanism. In light of this, the present invention provides novel tools, in which non-naturally occurring nucleic acids, such as LNA oligonucleotides, can be designed to silence or modulate the regulation of a given mRNA by non-coding antisense RNA, by designing a complementary sense LNA oligonucleotide for the regulatory antisense RNA. This has a high potential in target identification, target validation and therapeutic use of LNA oligonucleotides as modulating and silencing sense nucleic acid agents.
Misplaced Control of Alternative Splicing can Cause Disease
The detection of the detailed structure of all transcripts is an important goal for molecular characterization of a cell or tissue. Without the ability to detect and quantify the splice variants present in one tissue, the transcript content or the protein content cannot be described accurately. Molecular medical research shows that many cancers result in altered levels of splice variants, so an accurate method to detect and quantify these transcripts is required. Mutations that produce an aberrant splice form can also be the primary cause of such severe diseases such as spinal muscular dystrophy and cystic fibrosis.
Much of the study of human disease, indeed much of genetics is based upon the study of a few model organisms. The evolutionary stability of alternative splicing patterns and the degree to which splicing changes according to mutations and environmental and cellular conditions influence the relevance of these model systems. At present, there is little understanding of the rates at which alternative splicing patterns change, and the factors influencing these rates. Table 1 shows a set of genes that are known to be alternatively spliced and that are orthologs of known human disease genes.
TABLE 1C. elegans disease orthologs that are knownto be differentially spliced in C. elegans.DiseaseC. elegans geneBLAST E valuebrABLlM79.1A1.00E−162X-Linked Lymphoprol.-SH2D1AM79.1A2.00E−58Cyclin Dep. Kinase 4-CDK4F18H3.5A1.00E−124HNPCC*-PMS2H12C20.2A1.00E−123Neurofibromatosis 2-NF2C01G8.5A5.00E−163Duchenne MD+-DMDF32B4.3A0.00E+00Coffin-Lowry-RPS6KA3T01H8.1A2.00E−13Septooptic Dysplasia-HESX1Y113G7A.6A1.00E−152Non-Insulin Dep. Diabet.-PCSK1F11A6.1A1.00E−166Bartter's-SLC12A1Y37A1C.1A1.00E−167Gitelmans-SLC12A3Y37A1C.1A0.00E+00Hered. Spherocytosis-ANK1B0350.2A1.00E−09Darier-White-SERCAK11D9.2A0.00E+00Spondyloepip. Dysp.-COL2A1F01G12.5A/let-29.00E−20
Previously, other microarray analyses have been performed with the aim of detecting either splicing of RNA transcripts per se in yeast, or of detecting putative exon skipping splicing events in rat tissues, but neither of these approaches had sufficient resolution to estimate quantities of splice variants, a factor that could be essential to an understanding of the changes in cell life cycle and disease.
Thus, improved methods are needed for nucleic acid amplification, hybridization, and classification. Desirable methods can distinguish between mRNA splice variants and quantitate the amount of each variant in a sample. Other desirable methods can detect differences in expressions patterns between patient nucleic acid samples and nucleic acid standards.