Cytosine methylation occurs after DNA synthesis by enzymatic transfer of a methyl group from an S-adenosylmethionine donor to the carbon-5 position of cytosine. The enzymatic reaction is performed by one of a family of enzymes known as DNA methyltransferases. The predominant sequence recognition motif for mammalian DNA methyltransferases is 5′-CpG-3′, although non-CpG methylation has also been reported. Due to the high rate of methyl cytosine to thymine transition mutations, the CpG dinucleotide is severely under-represented and unequally distributed across the human genome. Vast stretches of DNA are depleted of CpGs, and these are interspersed by CpG clusters known as CpG islands. About 50-60% of known genes contain CpG islands in their promoter regions, and they are maintained in a largely unmethylated state except in the cases of normal developmental gene expression control, gene imprinting, X chromosome silencing, ageing, or aberrant methylation in cancer and some other pathological conditions. The patterns of DNA methylation are a critical point of interest for genomic studies of cancer, disease, and ageing. Methylation of DNA has been investigated in terms of cellular methylation patterns, global methylation patterns, and site-specific methylation patterns. The goal of methylation analysis is to develop discovery tools that increase our understanding of the mechanisms of cancer progression, and diagnostic tools that allow the early detection, diagnosis, and treatment of cancers and other diseases. In recent years it has become apparent that the transcriptional silencing associated with 5-methylcytosine is important in mammalian development, genome imprinting, X chromosome inactivation, mental health, and cancer, as well as for protection against intragenomic parasites.
Methylation in Cancer
Epigenetics is the study of inherited changes in DNA structure that affect expression of genes that are not due to a change in the DNA sequence. One major focus of epigenetic studies is the role of methylation in silencing gene expression. Both increased methylation (hypermethylation) and loss of methylation (hypomethylation) have been implicated in the development and progression of cancer and other diseases. Hypermethylation of gene promoter and upstream coding regions results in decreased expression of the corresponding genes. It has been proposed that hypermethylation is used as a cellular mechanism to not only decrease expression of genes not being utilized by the cell, but also to silence transposons and other viral and bacterial genes that have been incorporated into the genome. Genomic regions that are actively expressed within cells are often found to be hypomethylated in the promoter and upstream coding regions. In contrast, downstream regions are typically kept hypermethylated in actively transcribed genes, but become hypomethylated in cancer (Jones and Baylin, 2002; Baylin and Herman, 2000). Thus, there appears to be a cellular balance between silencing of genes by hypermethylation and hypomethylation of promoter and upstream coding regions of genes that are actively being expressed.
Hypermethylation of tumor suppressor genes has been correlated with the development of many forms of cancer (Jain, 2003). The genes most commonly being hypermethylated in various cancers include: 14-3-3 sigma, ABL1 (P1), ABO, APC, AR (Androgen Receptor), BLT1 (Leukotriene B4 Receptor), BRCA1, CALCA (Calcitonin), CASP8 (CASPASE 8), Caveolin 1, CD44, CDH1 (E-Cadherin), CFTR, GNAL, COX2, CSPG2 (Versican), CX26 (Connexin 26), Cyclin A1, DAPK1, DBCCR1, DCIS-1, Endothelin Receptor B, EPHA3, EPO (Erythropoietin), ER (Estrogen Receptor), FHIT, GALNR2, GATA-3, COL9A1, GPC3 (Glypican 3), GST-pi, GTP-binding protein (olfactory subunit), H19, H-Cadherin (CDH13), HIC1, hMLH1, HOXA5, IGF2 (Insulin-Like Growth Factor II), IGFBP7, IRF7, KAI1, LKB1, LRP-2 (Megalin), MDGI (Mammary-derived growth inhibitor), MDR1, MDR3 (PGY3), MGMT (O6 methyl guanine methyl transferase), MINT, MT1a (metallothionein 1), MUC2, MYOD1, N33, NEP (Neutral Endopeptidase 24.1)/CALLA, NF-L (light-neurofilament-encoding gene), NIS (Sodium-Iodide Symporter gene), OCT-6, P14/ARF, P15 (CDKN2B), P16 (CDKN2A), P27KIP1, p57 KIP2, p73, PAX6, PgR (Progesterone Receptor), RAR-Beta2, RASSF1, RB1 (Retinoblastoma), RPA2 (replication protein A2), SIM2, TERT, TESTIN, TGFBR1, THBS1 (Thrombospondin-1), TIMP3, TLS3 (T-Plastin), TMEFF2, Urokinase (uPA), VHL (Von-Hippell Lindau), WT1, and ZO2 (Zona Occludens 2).
While a small list of commonly hypermethylated sites are being routinely screened as potential sites of interest in many cancers, there is a current lack of methodologies for discovering new sites of interest that may play critical roles in the development and/or progression of cancer. There is also a lack of rapid and accurate methodologies for determining the methylation status of specific genes for use as diagnostic, treatment, and prognostic tools for cancer patients.
Hypomethylation has also been implicated as a mechanism responsible for tumor progression (Dunn, 2003). Several genes have been characterized as being hypomethylated in colon carcinoma and/or leukemia, including growth hormone, c-myc, gamma globulin, gamma crystallin, alpha and beta chorionic gonadotropin, insulin, proopiomelanocortin, platelet derived growth factor, c-ha-ras, c-fos, bcl-2, erb-A1, and ornithine decarboxylase. The majority of these genes are involved in growth and cell cycle regulation and it has been proposed that the loss of methylation in these genes contributes to unchecked cell proliferation in these and other cancer types.
While both hypermethylation and hypomethylation have been implicated in the development and progression of several cancers, their specific roles have not been fully elucidated. For instance, does hypermethylation of tumor suppressor genes lead to hypomethylation of cell cycle regulatory genes leading to unchecked cellular proliferation? In order to answer these and other important questions, rapid, accurate, and sensitive technologies for the analysis of DNA methylation patterns within normal and cancer cells are required.
Genome-Wide DNA Methylation Patterns
The analysis of global levels of DNA methylation has proven useful in the study of cancer, disease, and ageing. Changes in global methylation levels have been directly correlated with the development of several types of cancer, including: lung, colon, hepatic, breast, and leukemia (Fruhwald and Plass, 2002). The measurement of global methylation levels has been accomplished by several distinct technologies: Southern blotting, High Pressure Liquid Chromatography (HPLC), High Performance Capillary Electrophoresis (HPCE), MALDI mass spectrometry, and Chemical or Enzymatic incorporation of radio-labeled methyl groups (Fraga and Esteller, 2002).
Southern blotting techniques involve traditional, two-dimensional gel electrophoresis of DNA digested with a non-methylation sensitive restriction endonuclease (first dimension), followed by a methylation sensitive restriction endonuclease (Fanning et al., 1985). This procedure allows the differential resolution of banding patterns between two samples to compare relative methylation patterns. HPLC and HPCE methods both require the breakdown of DNA into the individual nucleotides which are then separated using either chromatography (HPLC) or electrophoresis (HPCE). For HPLC, the resulting methylcytosine and cytosine peaks can be resolved and quantified by comparison to known standards (Tawa et al., 1994; Ramsahoye, 2002). Although peaks can be identified for HPCE, there are no current quantification protocols for quantifying methylcytosine at this time (Fraga et al., 2000). Both of these methods are hampered by the requirement for a large amount of starting material, 2.5 g for HPLC and 1 μg for HPCE. Furthermore, these methods also require specialized, expensive equipment.
Recently, additional variations on the basic HPLC analysis method have been developed. These methods have combined HPLC techniques with primer extension and ion pair reverse phase (IP RP) HPLC (Matin et al., 2002), or electrospray ionization mass spectrometry (Friso et al., 2002). Both of these methods have sought to improve on the accuracy and sensitivity of the previous HPLC technique. The IR RP HPLC method combines bisulfite conversion of DNA with a primer extension reaction, followed by analysis of resulting products by HPLC.
The technique of matrix-assisted laser desorption/ionization (MALDI) mass spectrometry has also been utilized for the accurate quantification of methylation in cancer samples (Tost et al., 2003).
Enzymatic and chemical labeling of methylcytosine residues have also been used in order to quantify global methylation levels. The enzymatic methods involve the addition of a radio-labeled methyl group to cytosine, resulting in an inverse correlation between incorporated label and the amount of methylation in the sample (Duthie et al., 2000). A chemical method for labeling has also been developed based on fluorescent labeling of adenine and cytosine residues by chloracetaldehyde (Oakeley et al., 1999). This method relies on bisulfite conversion of non-methylated cytosines to uracil in order to allow the fluorescent labeling of only methylcytosine.
To study global methylation, Pogribny et al. (1999) have developed an assay based on the use of methylation-sensitive restriction endonucleases HpaII, AciI, and BssHII that leave 5′ guanine overhangs after DNA cleavage, with subsequent single radiolabeled nucleotide extension. The selective use of these enzymes was applied to screen for alterations of genome-wide methylation and CpG islands methylation, respectively. The extent of radioactive label incorporation was found to be proportional to the number of unmethylated (cleaved) CpG sites.
In Situ Analysis of DNA Methylation
Another method for investigating genome wide levels of methylation involves methylcytosine specific antibodies (Miller et al., 1974). This method also allows further investigations into levels of methylation on different chromosomes and even different parts of a single chromosome (Barbin et al., 1994). Furthermore, in situ hybridization can be utilized to analyze the differential methylation patterns of adjacent cells in tissue sections.
Site-Specific DNA Methylation Analysis
Analysis of site-specific methylation patterns can be divided into two distinct groups, bisulfite conversion methods and non-bisulfite based methods. The bisulfite conversion method relies on treatment of DNA samples with sodium bisulfite which converts unmethylated cytosine to uracil, while methylated cytosines are maintained (Furuichi et al., 1970). This conversion results in a change in the sequence of the original DNA. Analysis of the sequence of the resulting DNA allows the determination of which cytosines in the DNA were methylated. There are several methodologies utilized for the analysis of bisulfite converted DNA including sequencing, methylation-specific PCR, COBRA (COmbined Bisulfite Restriction Analysis), methylation-sensitive single nucleotide primer extension, and methylation-sensitive single-strand conformation analysis.
The major drawback to bisulfite conversion of DNA is that it results in up to 96% degradation of the DNA sample (Grunau et al., 2001). The harsh effect of bisulfite treatment, in combination with the need to convert all methylated cytosines, requires a substantial amount of input DNA in order to obtain enough usable DNA following conversion. Furthermore, the high levels of degradation complicate the detection of differences in methylation patterns in DNA samples from mixed cell populations, for example cancer cells in a background of normal cells. Changing the incubation conditions in order to minimize DNA degradation can result in incomplete conversion and the identification of false positives.
Bisulfite DNA Conversion Methods for Methylation Analysis
The most direct method for analysis of bisulfite converted DNA is direct sequencing (Frommer et al., 1992). Amplification of fragments of interest followed by sequencing will quickly and accurately identify all cytosines that were methylated, as all non-methylated cytosines will have been converted to Uracil. One drawback to direct sequencing is the necessity to design amplification and sequencing primers that are based on all of the possible sequences depending on the level of methylation. The conversion of cytosine to uracil will alter the priming sequences along with the target sequences. Furthermore, sequencing is a labor intensive and time-consuming activity if one is investigating large numbers of sequences and/or large numbers of samples.
Methylation-Specific PCR (MS-PCR) is the most commonly used technique for analysis of methylation. MS-PCR is utilized to determine the methylation status of specific cytosines following conversion of unmethylated cytosines to uracil by bisulfite conversion (Herman et al., 1996). The methylation status of specific cytosines can be determined by utilizing primers that are specific for the cytosine of interest. The differences in sequences following conversion allow different primer sets to determine whether the initial sequence was methylated. Melting curve Methylation Specific PCR (McMS-PCR) replaced sequence analysis of the resulting PCR products, with the more efficient process of melt curve analysis (Akey et al., 2002; Guldberg et al., 2002). Differences in the melting temperature of the products are due to the sequence differences resulting from bisulfite conversion of methylated versus unmethylated DNA samples. Another method for analyzing MS-PCR products using melting characteristics involves the use of denaturing high-performance liquid chromatography (Baumer, 2002). In this method, MS-PCR is carried out under conditions that will amplify both alleles (converted and unconverted cytosines). The products of MS-PCR are analyzed by HPLC under denaturing conditions, allowing the resolution of different products based on sequence differences due to bisulfite conversion.
One version of MS-PCR, called MethyLight (Eads et al., 2000), involves the use of fluorescence-based real-time quantitative PCR to allow both detection and quantitation of the converted products in one step. The major drawback of these techniques is the necessity to design primers specific for each methylation site that are based on the different converted sequence possibilities. An additional modification to the MethyLight protocol involves using an additional fluorescent probe directed against unconverted DNA. This protocol, ConLight-MSP, was developed to address the issue of overestimation of methylation due to incomplete conversion of DNA by bisulfite (Rand et al., 2002). A second method aimed at addressing the problem of incomplete bisulfite conversion is bisulfite conversion-specific Methylation-Specific PCR (BS-MSP) (Sasaki et al., 2003). In this technique, two rounds of PCR are carried out following bisulfite conversion of DNA. In the first round, primers are utilized that do not contain CpG's, but do contain cytosines at the 3′ position. Thus, only fully converted DNA will be amplified in the first round of amplification. A second, traditional MSP amplification is subsequently carried out to amplify the CpG's of interest. This will result in a lower level of background amplification of sites with incomplete conversion of DNA, and a more accurate determination of the level of methylation in the sample.
Other methods for site-specific methylation analysis include COBRA, Methylation-sensitive single nucleotide primer extension (MS-SNuPE), and methylation-sensitive single-strand conformation analysis (MS-SSCA). COBRA combines the techniques of bisulfite conversion with methylation-sensitive restriction endonuclease analysis (described below) to enable highly specific, highly sensitive quantitation of methylation sites contained within recognition sites for methylation-sensitive restriction enzymes (Xiong and Laird, 1997). Melting curve combined bisulfite restriction analysis (McCOBRA) was developed to allow analysis of bisulfite converted DNA without gel electrophoresis (Akey et al., 2002). In this procedure, bisulfite converted DNA is amplified by PCR with specific primer pairs surrounding a potential methylation site. The resulting PCR products are digested with a restriction site that will only recognize and cut DNA that was originally methylated. Melt curve analysis will yield two peaks, based on the size difference of the cut versus uncut DNA, and allow the determination of the methylation status of that site in the original DNA. Another variation of COBRA, termed Pyro-sequencing methylation analysis (PyroMethA) involves the use of the Pyrosequencing reaction to determine methylation status in place of the restriction analysis used in COBRA (Collela et al., 2003; Tost et al., 2003). MS-SNuPE combines MS-PCR amplification of bisulfite converted DNA with single nucleotide extension of MS-PCR products to incorporate radio-labeled C (methylated) or T (unmethylated) that can be detected using a phosphoimager (Gonzalgo and Jones, 1997). The ratio of C/T incorporation will indicate the level of methylation at a particular site. Finally, MS-SSCA utilizes bisulfite converted DNA with single-stranded conformational polymorphism (SSCP) analysis to detect sequence differences through changes in the migration of the molecules during electrophoresis (Burri and Chaubert, 1999; Suzuki et al., 2000).
Another method for analyzing the methylation status of specific sites was created based on changes in restriction endonuclease recognition sites following bisulfite conversion of DNA (Sadri and Hornsby, 1996). In this procedure, DNA is bisulfite converted and a specific region of interest is amplified by PCR. Following amplification, the resulting products are digested with either a restriction endonuclease that will only cleave the sequence generated by conversion of an unmethylated CpG, or a restriction endonuclease that will cleave the same site only if it was originally methylated and not converted by bisulfite treatment. Comparison of the products of digestion will indicate the methylation status of the site of interest and, potentially, relative levels of methylation of the site from a mixed population of cells. This method improves on normal MSP by not relying on differences in PCR amplification between converted and non-converted DNA. However, this method is also susceptible to incomplete conversion of the starting DNA. Furthermore, this method is dependent on bisulfite conversion resulting in a different restriction endonuclease recognition site being created by bisulfite conversion. The authors estimated that approximately 25% of CpG sites would be able to be analyzed by this method, leaving the majority of CpG sites unanalyzed. A newly developed technique, HeavyMethyl, utilizes real-time PCR analysis of unconverted DNA (Cottrell et al., 2004). Specificity for methylated sites is achieved by using a methylation sensitive oligonucleotide blocker. This blocker will only bind to unmethylated DNA, blocking annealing of the primer and preventing amplification. Methylated sequences will not bind the blocker and will be primed and extended, resulting in cleavage of the probe and fluorescent detection. The advantages of this system include lowered background, higher specificity of signal, and decreased requirement for starting material due to the lack of a bisulfite conversion step. However, development of each assay will require the design and optimization of 5 oligonucleotides: 2 primers, 2 blocking nucleotides, and a probe. This requirement will greatly increase the difficulty and cost of developing site-specific assays. Furthermore, small samples of DNA will only yield enough material for a few assays and will not allow analysis of large numbers of potential methylation sites.
All of the aforementioned methods that can be used to analyze bisulfite-converted DNA require several nanograms of converted DNA per assay and are thus impractical for genomewide methylation analysis. To allow genomewide methylation analysis by these methods, techniques must utilized that can efficiently amplify small quantities of converted DNA.
Non-Bisulfite Based Methods of Methylation Analysis
Non-bisulfite based methods for analysis of DNA methylation rely on the use of methylation-sensitive and methylation-insensitive restriction endonucleases (Cedar et al., 1979). Following digestion of sample DNA with either methylation-sensitive or methylation-insensitive restriction enzymes (ex. MspI and HpaII), the DNA can be analyzed by methods such as Southern Blotting and PCR. Southern blot analysis involves electrophoretic separation of the resulting DNA fragments and hybridization with a labeled probe adjacent to the CpG of interest. If the hybridization signal from the methylation-sensitive and methylation-insensitive digested DNA samples results in different size bands, than the site of interest was methylated. In contrast, PCR analysis involves amplification across the CpG of interest. The expected band will only be observed in the methylation-sensitive digested sample if the site of interest is methylated. The disadvantages of the Southern blotting assay is that specific probes must be developed for every site of interest and large amounts of starting DNA (ex: 10 μg) are required. The PCR assay requires much lower amounts of DNA for each site of interest (ex: 1-10 ng), but necessitate the design and testing of specific primer pairs for every site of interest. Furthermore, although each individual assay requires only nanogram quantities of DNA, analysis of hundreds or even thousands of potential methylation sites still involves μg quantities of DNA. The overall limitation of these technologies is their dependence on the presence of a methylation-sensitive restriction site present at the CpG of interest. Thus, although these assays are relatively quick and simple, they cannot be used to test all potential methylation sites. Furthermore, these methods can only be used for analysis of sites that have been previously identified and have had detection assays designed for them, and they do not allow for the discovery of new sites of interest.
Ligation-mediated PCR (LM-PCR) was developed to increase the sensitivity of methylation analysis by restriction endonuclease digestion (Steigerwald et al., 1990). In this method, the methylation status of specific sites is determined DNA is digested with a methylation-sensitive restriction endonuclease that will cleave a site of interest, along with a methylation-sensitive restriction endonuclease that will cut in fairly close proximity to the methylated site of interest. Following digestion, a primer extension reaction is performed using a previously characterized primer that is upstream from both digestion sites. A linker sequence is ligated to the resulting end of the extended sequence. A second primer extension step is performed using a primer based on the linker sequence, and PCR amplification is performed using the linker sequence and a nested primer downstream from the primer used in the primary primer extension reaction. The products of amplification are analyzed by gel electrophoresis. Two potential bands are produced by this method: a full length amplimer indicating methylation of the target sequence, and a shorter amplicon indicating a lack of methylation. A mixture of both products indicates that partial methylation existed in the sample, and an estimation of the amount of methylation can be determined by comparison of the ratio of the two products. This method greatly improved on the sensitivity of PCR-based methods of analysis, but is greatly hindered by the necessity of creating 2 primers for each loci of interest, and the requirement for analyzing 1 specific site per reaction.
The technique of Differential Methylation Hybridization (DMH) has been utilized to screen CpG island arrays to determine methylation status of a large number of sites at a time (Huang et al., 1999). In this procedure, DNA is digested with a frequent cutting restriction endonuclease to generate small DNA fragments. Linkers are ligated to the products of digestion and repetitive DNA is subtracted. The resulting molecules are digested with a methylation-sensitive restriction endonuclease. PCR of the digestion products with a primer complementary to the linkers results in amplification of all molecules that contain either methylated restriction sites or no restriction sites. The products of amplification are then hybridized to a CpG island array consisting of clones containing multiple restriction endonuclease sites for the enzyme used to digest the DNA. Hybridization to a clone indicates that the site was methylated in the starting DNA. This method requires the generation of a large number of clones for creation of the array and is limited by the ability to amplify the products of the original digestion. Many fragments will be either too large to be amplified, or be so small as to result in suppression of amplification or poor hybridization to the array. Furthermore, there will be a high level of background of products that do not contain methylation sites of interest that will affect the signal to noise ratio of the array hybridization.
Yan et al., (2001) and Chen et al., (2003) have developed a closely related method referred to as Methylation Target Arrays (MTA), derived from the concept of tissue microarray, for simultaneous analysis of DNA hypermethylation in multiple samples. In MTA, target DNA is digested with four-base restriction endonucleases, such as MseI, BfaI, NlaIII, or Tsp509I, known to restrict DNA into short fragments, but to retain CpG islands relatively intact. The GC-rich fragments are then isolated through an affinity column containing methyl-binding MeCP2 protein. Linkers are ligated to the overhangs of the CpG island fragments and are digested with methylation-sensitive restriction enzymes, BstUI and HpaII. Finally, the fragments are amplified with flanking primers. CpG sites that are methylated are protected from cleavage and are amplified in the process, whereas non-methylated CpG islands are lost to restriction. Initially, a microarray containing 7,776 short GC-rich tags tethered to glass slide surfaces was used to study 17 paired tissues of breast tumors and normal controls. Amplicons, representing differential pools of methylated DNA fragments between tumors and normal controls, were co-hybridized to the microarray panel. Hypermethylation of multiple CpG island loci was then detected in a two-color fluorescence system. Hierarchical clustering segregated these tumors based on their methylation profiles and identified a group of CpG island loci that corresponds to the hormone-receptor status of breast cancer. A panel of 468 MTA amplicons, representing the whole repertoire of methylated CpG islands in 93 breast tumors, 20 normal breast tissues, and 4 breast cancer cell lines, were arrayed on a nylon membrane for probe hybridization. Hybridization was performed with PCR-generated probes for 10 promoters, labeled with 32P-dCTP. Positive hybridization signals detected in tumor amplicons, but not in normal amplicons, were indicative of aberrant hypermethylation in tumor samples. This was attributed to aberrant sites that were protected from methylation-sensitive restriction digestion and were amplified by PCR in tumor samples, while the same sites were restriction digested and could not be amplified in normal samples. Hypermethylation frequencies of the 10 genes GPC3, RASSF1A, 3OST3B, HOXA5, uPA, WT1, BRCA1, DAPK1, and KL were tested in breast tumors and cancer cell lines.
The aforementioned DMH and MTA technologies are described in U.S. Pat. No. 6,605,432, PCT WO03/087774A2, and U.S. Patent Application US20030129602A1 by Huang (see bellow). Drawbacks of these methods are the lack of complete coverage of all regions of the genome during the initial restriction digest, generation of false positive results due to incomplete cleavage by a methylation-sensitive restriction enzyme, inability to analyse nicked, degraded, or partially double-stranded DNA from body fluids, as well as lack of quantitation and relatively low sensitivity. Thus, these techniques are limited to applications in which large quantities of DNA are readily available and methylated DNA represents high percentage of the total DNA. Therefore, a sensitive diagnostic method that is capable of amplifying all regions of the genome and detect methylation when using samples containing only small fraction of methylated DNA in a vast majority of non-methylated DNA is still needed.
Several techniques have been developed in order to identify unknown methylation hotspots, including restriction landmark genomic scanning (RLGS), methylation-sensitive representational difference analysis (MS-RDA), methylated CpG island amplification-representational difference analysis (MCA-RDA), methylation-sensitive arbitrarily primed PCR (MS-AP-PCR), methylation-spanning linker libraries (MSLL), differential methylation hybridization (DMH, see above), methylation-sensitive amplification polymorphism (MSAP), affinity capture of CpG islands, and CpG island microarray analysis (see above).
RLGS involves the digestion of high molecular weight DNA by a methylation sensitive restriction endonuclease, such as NotI, that targets CpG islands (Hayashizaki et al., 1993). The products of digestion are differentiated by two dimensional gel electrophoresis involving 2nd and 3rd digestions with non-methylation sensitive restriction endonucleases (Rush and Plass, 2002). The pattern of banding between two samples can be compared to determine changes in methylation status. Subsequently, these techniques have been expanded to include cloning of specific bands from the 2-D gel in order to identify methylated sequences. Recently, computer based RLGS systems have been developed to predict banding patterns based on digestion of genomic DNA with methylation-sensitive restriction endonucleases (Masuyama et al., 2003; Rouillard et al., 2001; Akiyoshi et al., 2000). The drawbacks of these techniques include a requirement for a large amount of starting material, the difficulty of resolving complex samples containing cells with different methylation patterns, and the large amount of work necessary to identify all of the bands of interest. Furthermore, although this technique is reproducible, sequence variations between samples can result in gain or loss of cleavage sites, resulting in changes in the banding pattern that are not related to changes in methylation.
Methylation-sensitive representational difference analysis (MS-RDA) was developed to determine differences in methylation status between control and cancer samples to allow the identification of methylated regions in cancer (Ushihima et al., 1997; Kaneda et al., 2003). In this method, two DNA samples (Tester and Driver) are digested with a methylation-sensitive restriction endonuclease. The resulting products from each sample have an adaptor ligated to them and are amplified by PCR. Following amplification, the adaptors are removed and a second adaptor is ligated to the 5′ end of the tester sample. The two samples are mixed, with the driver in large excess compared to the tester. Denaturing and annealing steps result in the production of mostly driver/driver or driver/tester molecules for sites that were methylated in the driver and the tester DNA, and tester/tester molecules for sites that were methylated in only the tester DNA sample. The resulting 3′ ends are filled in, producing molecules with the second adaptor at both ends only in the case of tester/tester hybridization. Amplification of the tester/tester hybrids by PCR using the second adaptor sequence results in isolation of those sites methylated only in the tester sample. The enriched molecules can then be analyzed by a number of techniques known in the art, including PCR, microarray hybridization, and sequencing. Although this protocol has been useful in the identification of specific methylation differences between cancer and normal samples, there are several limitations inherent in this methodology. The limitations of this technology include the requirement for two restriction endonuclease sites within close enough proximity to allow PCR amplification, but not so close as to result in suppression of the resulting products. Furthermore, RDA produces only enrichment of sequences and does not completely select against sites that are methylated as some tester/tester hybrids are formed even in the presence of a large excess of driver.
Another related procedure, methylated CpG island amplification-representational difference analysis (MCA-RDA), was developed to amplify and enrich methylated CpG islands present in the tester DNA (Toyota et al., 1999; Toyota and Issa, 2002). In this method, tester and driver are first digested with a methylation-sensitive restriction endonuclease that results in blunt ends (ex: Sma I). Subsequently the methylated restriction sites are cleaved with a non-methylation-sensitive isoschizomer of the first endonuclease (ex: Xma I) that produces overhanging ends. Adaptors are ligated to the resulting overhanging ends, but not to the blunt ends. The molecules that contain an adaptor at both ends are amplified by PCR and RDA is performed as described above to select for those molecules only present in the tester population. This protocol improves on MS-RDA by amplifying entire CpG islands. However, this method is even more limited than MS-RDA in that appropriate isoschizomers for methylated restriction sites are required to produce the libraries.
The procedure of methylation-sensitive arbitrarily primed PCR (MS-AP-PCR) was developed in order to identify genomic regions with altered patterns of methylation (Gonzalgo et al., 1997). In this method, DNA is digested with methylation sensitive and methylation insensitive restriction endonucleases. Following digestion, arbitrarily primed PCR is performed using short primers under low stringency conditions for a couple of cycles, followed by high-stringency amplification. The products are separated by high-resolution polyacrilimide gel electrophoresis and band differences between control and test samples are isolated and sequenced. The banding patterns observed during electrophoresis are fairly reproducible between reactions due to the fact that a specific primer sequence is utilized for each reaction. Random primed PCR is different in that it utilizes degenerate primers that contain a large number of primer sequences.
The identification of epigenetic boundaries was determined in corn by creating methylation-spanning linker libraries (MSLL) (Yuan et al., 2002). In this method, genomic DNA is digested with a methylation-sensitive restriction endonuclease and ligated into BAC vectors. The resulting libraries were end-sequenced and analyzed for methylated DNA sites. This technique allows the determination of methylated sequences without a priori knowledge, and allows the improved cloning and sequencing of genomic regions that are resistant to shotgun cloning. However, MSLL is a low-throughput technology that is limited by the constraints of sequencing large numbers of clones that will contain many repeats of the same insertion sequences.
Methylation-sensitive amplification polymorphism (MSAP) has been utilized to determine changes in methylation patterns in banana plants (Peraze-Echeverria et al., 2001). In this technique, a double digest is performed on two aliquots of DNA. There is a common methylation insensitive restriction endonucleases utilized in both digestions. The second restriction endonuclease is methylation sensitive in one digest (ex. Hpa II), and a methylation insensitive isoschizomer (ex. Msp I) in the other digest. The resulting products of digestion have adaptors ligated to them and are amplified under various selective conditions. The amplicons are then subjected to gel electrophoresis and detection. Comparisons are made between the samples digested with methylation sensitive and methylation insensitive restriction endonucleases between samples. Changes in the banding patterns are recorded as changes in methylation patterns in different samples. This technique allows the amplification and analysis of specific sites of methylation, but is dependent on the existence of methylation sensitive and methylation insensitive restriction endonuclease isoschizomers.
The Methylation-Dependent Restriction Endonuclease McrBC
McrBC is an E. coli protein complex that cleaves DNA based on recognition of RmC sequences that are separated by 40 to 3000 bp (Sutherland et al., 1992; Stewart and Raliegh, 1998). McrBC induced cleavage occurs by DNA translocation following binding of the DNA at the RmC recognition site, resulting in interaction of two McrBC substrates (Dryden et al., 2001). Thus, cleavage by McrBC does not always result in cleavage at the same location between methylation sites and different patterns of cleavage can be observed in DNA with multiple methylation sites at varying distances from each other, depending on the number and density of methylated sites. The requirement of McrBC for the two methylation recognition sites to occur on the same strand (cis) or on opposite strands (trans) is not clear. There has been one report of successful cleavage of both cis methylated DNA and trans methylated DNA (Sutherland et al., 1992), but further clarification of this issue is required.
There is an example of McrBC being used to identify methylated regions of interest (PCT WO 03/035860). This method involves the degradation of two sources of DNA. One sample is degraded with an enzyme such as McrBC, and one sample is degraded with a methylation-sensitive restriction endonuclease. The hybridization of the two samples provides a screen to determine which samples were cut with McrBC. The hybridized products are isolated and the resulting molecules are sequenced to identify the methylated regions of interest. While this protocol is aimed at universal detection of global methylation patterns through use of McrBC, it involves a subtractive procedure and does not allow the amplification of the products following subtraction and isolation.
Other uses for McrBC that have been reported include using McrBC expressing bacterial strains to digest plasmids containing genomic DNA in order to subtract repetitive elements (i.e. heavily methylated) in order to isolate genomic regions of interest from plants (U.S. Patent Application US20010046669). The specific steps involve fragmenting DNA, inserting the DNA fragments into a suitable vector, and then inserting the library DNA into McrBC expressing bacteria. The bacteria will cleave any vector sequences that contain sequences with multiple methylated genomic inserts. Thus, only non-methylated inserts will contain intact plasmids that will grow. The resulting colonies contain molecules from regions of hypomethylation. This method was utilized to increase the cloning of gene-coding regions from plant genomes.
Methylation patterns in simple genomes have been investigated by use of McrBC cleavage (Badal et al., 2003). In this work, the methylation patterns of HPV were investigated in cervical cancer. Viral genomic DNA was digested by McrBC and the resulting fragments underwent bisulfate sequencing. The small size of the HPV genome (7900 bp) allows repetitive sequencing efforts to quickly identify all sequences and methylation sites within the HPV genome. This methodology has limited application to human DNA due to the large size of the human genome. Furthermore, there are no mechanisms for amplifying or selecting molecules based on their methylation status.
Patents and Patent Applications Related to Methylation Detection and Analysis
U.S. Pat. No. 6,214,556 B1 and corresponding PCT WO99/28498 issued to Olek et al. describe a method of methylation analysis in which DNA is fragmented by means of mechanical shearing or digestion with a restriction endonuclease and then treated with sodium bisulfite to convert non-methylated cytosine to uracil. Converted DNA is amplified by two different methods. In the first method, double-stranded adaptor molecules of known sequence are ligated to the DNA fragments before bisulfite conversion and then amplified by polymerization using primers complementary to the adaptor sequences present after the bisulfite treatment. In some versions of the method, the primers used for amplification can also contain one to four bases long 3′-extensions that go into the unknown sequence and that represent different base permutations. In the second method representing a modification of the DOP-PCR technique, primers that contain a constant 5′ region and a degenerate 3′ region are used to amplify converted DNA fragments or subsets of them. In both methods of amplification two types of sequences are used for amplification. Type one sequences completely lack cytosine or only have cytosine in the context of the CpG dinucleotide, and type two sequences completely lack guanine or only have guanine in the context of the CpG dinucleotide These two types of sequences are used to specifically target strands of DNA that are rich in guanine or rich in cytosine respectively after bisulfite conversion. Overall the quantity of the remaining cytosines on the G-rich strand or the quantity of remaining guanines on the C-rich strand is determined by hybridization or by polymerization. In one version of the method, the target DNA is cleaved with methylation-sensitive restriction enzyme prior to bisulfite conversion for the obvious reason of reducing the amount of non-methylated DNA. The method described above suffers from the inherent drawbacks of all techniques based on bisulfite conversion, namely reduced sensitivity due to significant loss of DNA during the process of bisulfite conversion that compromises the analysis of clinical samples containing only small percentage of methylated DNA in a vast majority of non-methylated DNA, as well as problems implementing the method to assay methylation in clinical settings due to multiple and complex preparation steps.
U.S. Patent Applications 20030099997A1 and 20030232371A1 and corresponding PCT WO 03/035860A1 by Bestor disclose methods for detection of methylated promoters and gene identification based on differential hybridization of a test and control DNA samples, one of which has been treated with a methylation-dependent endonuclease McrBC and the other one by a methylation-sensitive restriction endonuclease (HpaII, HhaI, MaeII, BstU, or AciI). The two samples are modified such as to prevent formation of duplexes between homologous DNA fragments. The samples from the two sources are then denatured and hybridized to form hetero-duplexes. The modification of at least one of the samples is performed in such a way as to facilitate the isolation of the resulting hetero-duplexes that are then analyzed by sequencing and the positions of methylated cytosines are determined. Although this technology can accurately determine the methylation status of a gene promoter and allows for the discovery of new sites of interest, it suffers from limitations such as the requirement for significant amount of starting DNA material, inability to process multiple samples simultaneously, and dependence on the presence of a methylation-sensitive restriction site present at the CpG of interest.
PCT WO 03/027259A2 by Wang describes a method for analysis of the methylation status of test and control DNA samples based on cleavage of the DNA with methylation sensitive restriction enzyme(s), ligation of linkers to the generated overhangs, PCR amplification, and labeling of the fragments receiving ligated linkers, hybridization of the fragments on solid support containing immobilized target DNA sequences, and comparison of the signals produced after hybridization of the test and control samples, thereby detecting the extent of methylation of one or more regions of DNA. This is limited by dependence on the presence of a methylation-sensitive restriction site present at the CpG site(s) of interest and that this procedure can only be used for analysis of sites that have been previously identified. Thus, it does not allow for the discovery of new methylation sites of interest.
PCT WO 03/025215A1 by Carrol et al. describes a method for analysis of DNA methylation patterns by digesting DNA with a methylation-sensitive restriction enzyme followed by amplification with primers annealing to the non-cleaved form of the recognition sequence. The results of the amplification reaction are then compared to an identical reaction run in parallel using the same primers to amplify another aliquot of the DNA sample that has not been cleaved with restriction enzyme. This method is limited to the availability of suitable restriction sites and requires significant amounts of input DNA for analysis of multiple restriction sites. In addition, it depends on the complicated design and empirical testing of primers for each of thousands of potentially methylated sites required for successful profiling, each with very high GC content.
PCT WO 03/080862A1 to Berlin discloses a method and devices for amplification of nucleic acids retaining the methylation pattern of the original template. The method comprises denaturing of genomic DNA, annealing of specific primers in an extension/polymerization reaction with DNA polymerase, and incubation of the resulting double-stranded DNA with a methyltransferase in the presence of a labeled methyl group donor to restore the methylation pattern encoded in the original template. The described steps are repeated several times, resulting in linear amplification that retains the methylation status of the target DNA. Amplified DNA is then digested by a methylation-sensitive restriction enzyme or subjected to bisulfate conversion, and the resulting products are analyzed by methods capable of retrieving the methylation information. While this method can amplify DNA regionally while retaining the methylation information of pre-designed sites, amplification of DNA in linear mode is a slow and inefficient process, as opposed to exponential amplification. Furthermore, the amount of input DNA required for the procedure is still significant. In addition, this method is limited to regions for which prior knowledge of methylation is known. Thus, it cannot be applied for genome-wide screening of methylation patterns.
U.S. Pat. No. 6,300,071B1 issued to Vuylsteke et al. describes a method for detecting DNA methylation using the technique of Amplified Fragment Length Polymorphisms (AFLP). A test and a control DNA sample are digested with one or more specific restriction endonucleases to fragment DNA into series of restriction fragments. The resulting restriction fragments are ligated with one or more double-stranded synthetic oligonucleotide adaptors. A combination of methylation-sensitive and methylation-insensitive restriction enzymes is used to produce amplifiable fragments that originate from either methylated or from non-methylated DNA. A combination of primers that a complementary to specific promoter sequences and primers complementary to adaptor sequences is used for PCR amplification and the resulting fragments are analysed by gel electrophoresis for restriction patterns. This method can be used for simultaneous analysis of methylation at multiple promoters but requires prior knowledge of sequences, empirical testing of multiple primers for compatibility and has limited application for clinical diagnostics.
Patent US 2005/0009059A1 ussued to Shapero et al. provides a method for determining if a cytosine in a target DNA sequence is methylated by the steps of: fragmentation with restriction enzyme, ligation of a double-stranded adaptor with a common priming sequence, conversion of non-methylated cytosines to uracils by treatment with sodium bisulfite, and hybridizing a capture probe comprising a second common sequence, a tag sequence, a recognition sequence for Type IIS restriction enzyme, and a region that is complementary to a region of the target sequence 3′ of a cytosine. The capture probe is extended and amplified with first and second common sequence primers to generate double-stranded extended capture probe that is then digested with Type IIS restriction enzyme. The resulting fragments are extended by one base with a labeled nucleotide and analyzed using an array of oligonucleotide probes. As other methods in the art based on conversion with sodium bisulfite the method described in this patent is limited to using only relatively large amounts of input DNA and requires design of complex oligonucleotide probes that are difficult to make compatible in a multiplex reaction.
U.S. Pat. No. 6,605,432, PCT WO03/087774 A2, and U.S. Patent Application US20030129602A1 by Huang describe the previously discussed Differential Methylation Hybridization (DMH) and Methylation Target Arrays (MTA) technologies (see Yan et al., 2001, Chen et al., 2003, and Huang et al., 1999). One to two micrograms of genomic DNA isolated from tumor or control samples are digested overnight with Mse I, a four-base restriction enzyme that cuts frequently in the rest of the genome but less frequently in CpG islands leaving promoter sites relatively intact. Digested products are purified and ligated to double-stranded linker of known sequence. Ligated DNA fragments are then purified and digested overnight with the methylation-sensitive restriction enzyme BstUI. After purification and buffer exchange the samples are digested again overnight with another methylation-sensitive restriction enzyme, HpaI. Samples are amplified by PCR using primer complementary to the known linker sequence. The resulting products are labeled and hybridized to microarrays comprising CpG island clones or other CpG-rich genomic probes.
The methods described in these patents require microgram quantities of DNA and involve multiple steps including 3 overnight digestions and 3 purification steps They also suffer from additional drawbacks such as the lack of complete coverage of all regions of the genome during the initial restriction digest. Regions with low density of cleavage sites will not be amplified and their methylation status could not be determined using this technology. Incomplete cleavage by methylation-sensitive restriction enzyme will produce false positive results. Also, if the DNA source is nicked or degraded or only partially double-stranded as is often the case with DNA in blood circulation or other body fluids, cleavage with restriction enzyme will be inefficient and the method will perform poorly. In addition, the method of detection by microarray hybridization employed in these techniques is not quantitative and has limited dynamic range and low sensitivity. Thus, the methods described in these patents are limited to applications in which large quantities of DNA are readily available and methylated DNA represents high percentage of the total DNA.
The aforementioned methods in the art that employ adaptor ligation to DNA fragments are suitable for high molecular weight DNA samples and for partially degraded DNA but not for circulating, cell-free DNA samples from serum, plasma, and urine, which are heavily degraded and comprised substantially of mono-, di-, and tri-nucleosomal sized fragments shorter than 500 bp. First, a 4-bp recognition sequence restriction enzyme only cleaves on average every 256 base pairs, so methods that rely on such cleavage prior to adaptor ligation will not be applicable to any mononucleosomal sized fragments and to only a minority of dinucleosomal sized fragments. Second, there are no descriptions in the art for converting heavily damaged DNA containing nicks or single-stranded gapped regions into amplifiable molecules that retain methylation information. These limitations of the art preclude effective methylation analysis of DNA from non-invasive clinical sources such as serum, plasma, and urine, since a majority of the DNA may remain in an unamplifiable form. Thus, there exists a need for methods that can amplify substantially all the DNA from such sources to increase the sensitivity of methylation assays and to reduce the quantity of such DNA required for analysis. These novel methods will be of particular importance for diagnostic applications, where methylated markers indicative of a condition may exist only as a minor (<1%) fraction within the samples.