Analysis of the primary structure of nucleic acids (as DNA and RNA) including epigenetic modifications (i.e. DNA methylation) can be addressed with the use of different techniques commonly termed “sequencing”.
All methods currently available do not analyse directly the original material. They require the processing or conversion of the original template, the generation of a replica and often the amplification of the replica. The generated replicas (named genomic libraries) are suitable to be sequenced by using one or many of the currently available sequencing technologies (e.g. Illumina, Roche or IonTorrent sequencing platforms).
The sequencing can be performed either at the low scale, which consists in the analysis of selected fragments, or high throughput (also named genome-scale), which consists in the massive analysis of all or a large representation of the whole material. The length of the fragment that can be analysed depends on the sequencing methodology used. Current state of the art sequencing techniques aiming the genomic scale and most of the locus specific assess DNA strands separately.
The current Gold Standard for the assessment of DNA methylation implies the chemical transformation with bisulfite of the nucleic acids, which results in the generation of ambiguity, as non-methylated cytosines will be transformed to uracils and visualized as thymines, which makes them indistinguishable from actual thymines in every sequencing method. This reduction of information represents a challenge to genome-scale approaches since there are some drawbacks which are still unsolved and limit their applications, for example:    1) independent processes must be used to determine the primary sequence (i.e. for the detection of mutations or genetic variants) and the epigenetic modifications (i.e. methylation of cytosines);    2) the generated ambiguity limits the efficiency (a large proportion of sequence reads are discarded as ambiguous) and coverage (some regions cannot be analysed) and involves a demanding computational processing;    3) high amounts of starting material are required to perform studies with high coverage;    4) uncontrolled biases limit the quantitative determination; and    5) sequencing errors are hardly detected by the system.
Another method is the so-called hairpin-bisulfite PCR method (see Laird et al., 2004, Proc. Natl. Acad. Sci, USA 101, 204-209; Riggs and Xiong, 2004, Proc. Natl. Acad. Sci. USA 101, 4-5). In this method, prior to the bisulfite treatment, the two complementary strands are covalently linked by means of a hairpin loop sequence. However, this method is suitable only for a specific double-stranded molecule and not for determining the sequence of a population of double stranded DNA molecules and particularly for the identification of methylated cytosines in a population of double stranded DNA molecules.
It has therefore been of interest to develop further methods for determining the sequence of a population of double stranded DNA molecules and particularly for the identification of methylated cytosines in a population of double stranded DNA molecules which are capable of solving all or some of the above mentioned drawbacks related to the methods of the state of the art.
WO2010/048337 discloses a method for identifying methylated cytosines comprising the steps of generating a complementary copy of a template nucleic acid using a bisulfite-resistant cytosine analog, optionally pairing the template nucleic acid and the complementary copy, converting non-methylated cytosine residues in the template nucleic acid and the complementary copy to uracil residues, and determining the nucleotide sequence of the bisulfite-converted template nucleic acid and the non-converted complementary copy. Since both the bisulfite-converted template nucleic acid and the non-converted complementary copy are rich in methylated cytosines, these strands are however difficult to process.
The present invention addresses these problems.