Many diseases, in particular cancer diseases, are accompanied by modified gene expression. This may be related to a mutation of the genes themselves, which leads to an expression of modified proteins or to an inhibition or over-expression of the proteins or enzymes. A modulation of gene expression may, however, also occur by epigenetic modifications, and in particular by DNA methylation. Such epigenetic modifications do not alter the actual DNA coding sequence, but nonetheless have substantial health implications, and it is clear that knowledge about methylation processes and modifications of methylation related metabolism and DNA methylation are essential for understanding, prophylaxis, diagnosis and therapy of diseases.
The precise control of genes, which themselves represent but a small part of the complete mammalian genome, is a question of regulation in consideration of the fact that the bulk of genomic DNA in non-coding. The existence of such non-coding ‘trunk’ DNA containing introns, repetitive elements and potentially actively transposable elements, necessitates effective mechanisms for their durable suppression (silencing). Cytosine methylation by S-adenosylmethionine (SAM)-dependent DNA methyltransferases, which forms 5-methylcytosine, represents one such mechanism for modification of DNA-protein interactions. Genes can be transcribed by methylation-free promoters, even when adjacent transcribed or non-transcribed regions are widely methylated. This permits the use and regulation of promoters of functional genes, whereas the trunk DNA including the transposable elements is suppressed. Methylation is also involved in the long-term suppression of X-linked genes, and may lead to either a reduction or an increase of the degree of transcription, depending on where the methylation in the transcription unit occurs.
Nearly the complete natural DNA methylation in mammals is restricted to cytosine-guanosine (CpG) dinucleotide palindrome sequences, which are controlled by DNA methyl transferases. CpG dinucleotides represent about 1 to 2% of all dinucleotides and are concentrated in so-called CpG islands. A generally accepted definition of CpG islands means a DNA region of about 200 bp having a CpG content of at least 50%, and where the ratio of the number of observed CG dinucleotides and the number of the expected CG dinucleotides is larger than 0.6 (Gardiner-Garden, M., Frommer, M. (1987) J. Mol. Biol. 196, 261-282). Typically, CpG islands have at least 4 CG dinucleotides in a sequence having a length of 100 base pairs.
If CpG islands are present in promoter areas, they often have a regulatory function for the expression of the respective gene. If the CpG island is hypomethylated, expression can take place. Hypermethylation often leads to the suppression of the expression. In the normal state, a tumor suppressor gene is hypomethylated. If a hypermethylation takes place, this will lead to a suppression of the expression of the tumor suppressor gene, which is frequently observed in cancer tissues. In contrast thereto, oncogenes are hypermethylated in healthy tissue, whereas in cancer tissue they are frequently hypomethylated.
Cytosine methylation typically prevents the binding of proteins regulating transcription. This leads to a modification of associated gene expression. In the context of cancer, for example, the expression of cell division regulating genes is thereby affected (e.g., the expression of apoptosis genes is down-regulated, whereas oncogene expression is up-regulated). DNA hypermethylation also has a long-term influence on gene regulation. Via cytosine methylation, histone de-acetylation proteins can bind to the DNA by their 5-methyl cytosine-specific domain. Consequently, histones are de-acetylated, leading to a tighter DNA compaction, whereby regulatory proteins are precluded from DNA binding.
Consequently, the efficient detection of DNA methylation patterns is an important tool for developing new approaches for prevention, diagnosis and treatment of diseases and for target screening. In particular, individualized methylation profiles can be prepared, and a tailored therapy thereby deduced. Additionally, the effects of a therapy can be monitored.
There is, therefore, a pronounced need in the art for novel and efficient methods for identifying and characterizing unknown methylation patterns.
Differential Methylation Hybridization (DMH; Huang et al, Hum Mol Genet, 8:459-470, 1999; U.S. patent application Ser. No. 09/497,855, both incorporated by reference in their entirety) is an art-recognized method for determining methylation patterns or for determining hypermethylated CpG islands. In DMH applications, DNA fragments obtained by digestion with restriction enzymes are hybridized on a DNA microarray that carries cloned CpG islands. DNA, originating from a tissue sample, is initially cut with a single non-methylation-specific restriction enzyme (e.g., MseI). The resulting fragments are then ligated with linkers, and the linker-ligated fragment mixture is cut with methylation-specific endonucleases (e.g., BstUI and/or HpaII), and amplified by means of PCR. The resulting amplified fragment mixtures are also referred to herein as DMH ‘amplificates’ or ‘amplicons.’ After a purification step, the amplicons (amplificates) are coupled with a fluorescence dye. Typically, the preceding steps are performed on the one hand with diseased tissue DNA and on the other hand with DNA from adjacent healthy tissue of the same tissue type, and the respective fragments are labeled with different fluorescence dyes. Both fragment solutions are then co-hybridized on a DNA microarray having immobilized CpG island sequences. After washing steps, a picture of the DNA microarray is taken with a commercial scanner that is sensitive to fluorescence radiation. The picture or pattern of fluorescent dots visible therein is analyzed to determine differences in methylation between and among CpG clones (see, e.g., Wei et al, Clinical Cancer Research, 8:2246-2252, 2002; Yan et al, Cancer Res. 61:8375-80, 2001; see also WO 2003/087774 (PCT/US03/11598), and U.S. Pat. No. 6,605,432).
In DMH applications, the immobilized nucleic acids are composed of clones from so-called “CpG island libraries (CGI libraries)”; that is, from libraries of clones having typical lengths of 200-700 base pairs and being enriched for CpG islands. Typically, clones including repeat sequences are also present (see, e.g., WO 2003/087774 (PCT/US03/11598)). Unfortunately, the relatively high production expenses of the CGI clone libraries are an inherent drawback of the method.
Additionally, to a significant extent the utility of DMH is limited to general genome analysis (discovery analysis), where only a broad analysis of the the genome sequence is desired. This is because of: (i) the number of coupling positions on the microarray is limited; (ii) the presence of repeat sequences unfortunately reduces the capacity of the DNA microarray; and (iii) the limited number of coupling positions on the microarray is therefore not used in an optimum manner by different partial sequences.
Further drawbacks of DMH are that: sequences may be redundantly present in CGI clone libraries; that cross contamination of the clones leads to a mixing of the library; and the possibility of cross-hybridizations, and the large expenses for production. Sequence redundancy can be explained by the presence of partially overlapping clones, or by multiple recurrences of the same clone. Additionally, because of the length of the clones, the possibility of cross-hybridization events cannot be excluded, and with increasing length, the probability of repeats becomes higher. The large ‘production expenses’ are caused, among other factors, by the necessity to sequence all clones of the library.
A further problem in DMH applications is that the number of fragments to be tested is enormously complex, leading to unstable signals, increased cross-hybridization and increased occurrence of non-specific hybridization. The theoretical reason for the high complexity relates to the fact that, in the art-recognized DMH method, all fragments that are not cut by methylation-specific restriction enzymes are amplified in the last step. Because the number of fragments that simultaneously have a restriction recognition sequence and are down-methylated is very small, the complexity of the mixture is extremely high, and effectively reflects amplification of a substantial portion of the entire genome. Therefore, a specific reduction of fragment complexity would be particularly desirable here, because a very large number of different fragments leads to comparatively small amplification factors; that is, individual fragments per se are only slightly amplified, and the difference in the copy-number between methylated and unmethylated fragments is small. Even if the amplification factor could be increased, detection of individual fragments from a very large population of different fragments would not be possible or would be substantially problematic, because of cross hybridization effects. With regard to such excessive complexity, reference is made to the document Lucito, et al., Genetic Research (2000).
There is, therefore, a pronounced need in the art for more simplified methods to effectively reduce the complexity of the obtained DNA fragment solutions obtained in DMH applications, and preferably where such methods simultaneously afford obtaining potentially interesting fragments.
A method referred to as “MSO,” has also been described by Gitan, et al (Gitan R. S., Shi H., Yan P. S., Huang T. H-M., Methylation-specific oligonucleotide microarray: A new potential for high-throughput methylation analysis. Genome Res., 12:158-164, 2001). The Gitan implementation describes the investigation of methylation sites within a defined region, such as a specific CpG island.
The drawbacks of methods based on analysis of bisulfite-transformed DNA are the additional expenses for the method, and the relatively high loss of DNA that occurs during the bisulfite treatment. Further, the design of the requisite oligos becomes more difficult, because the complexity of the investigated nucleic acids became less by the substantial elimination of the cytosines (by conversion of the unmethylated cytosines into thymines).
Furthermore, the detection of SNPs (single nucleotide polymorphisms) is considerably more difficult and sensitive for/vulnerable to cross hybridizations.
In other contexts, microarrays carrying oligonucleotides are in principle known, and these oligonucleotides can be synthesized on the substrate of the microarray, which makes this kind of detection generally advantageous for high-throughput methods.