Genomes of mammals and other vertebrates display a pattern of post-synthetic chemical modification. Only one kind of modification is known in these organisms, namely the methylation of cytosine to give 5-methylcytosine. The target DNA sequence of this methylation modification is the dinucleotide 5′CpG3′, also referred to as the CpG dinucleotide, which is self-complementary and occurs in symmetrical pairs.
In the human genome, approximately 70% of CpG dinucleotides are methylated in most cell types. In particular clusters of non-methylated CpG dinucleotides called CpG islands occur at the transcriptional start sites of 56% of human genes. These CpG islands are generally between 1 and 2 Kbp in length. Altogether these CpG island clusters account for about 2% of the genome. CpG moieties in the remaining 98% of the genome are sparsely distributed and approximately 80% of the CpG pairs located therein are methylated. Because of the high cytosine-guanine frequency of CpG islands, it is possible to identify them without knowledge of the methylation pattern of the DNA. Using this bioinformatic criterion, the human genome project has estimated that there are about 30,000 CpG islands per genome.
Promoters and transcription start sites of most mammalian genes comprise CpG islands. Normally CpG dinucleotide pairs within a CpG island are non-methylated and as a result, the gene associated with the CpG island is transcribable, though not necessarily transcribed. Almost all of the CpG islands within the inactive X-chromosome of cells derived from female humans are heavily methylated and condensed, with the exception of the Xist gene which remains unmethylated at its 5′ CpG island. This corresponds to the activity of the Xist gene which is required for the initial propagation of the inactive state.
As a result of the fact that the methylation status of these CpG islands is copied during the replication of DNA, a gene silenced by heavy methylation of CpG dinucleotide pairs, may be propagated indefinitely.
The relative association with increased CpG dinucleotide density and non-methylation has been explained by the occurrence of spontaneous deamination over time. Un-methylated CpG dinucleotides may spontaneously deaminate to the nucleotide uracil, which is recognised by DNA repair mechanisms within the cell, and converted back to cytosine. However, deamination of methylated cytosine results in conversion to thymine, which is not recognised by the same DNA repair mechanisms and consequently persists as a 5′TG3′ dinucleotide. This is highlighted by pseudogenes in which sequences homologous to expressed genes (for example α-globulin pseudogene) are methylated at the cytosine nucleotides and are seen to loose a large proportion of the CpG dinucleotides over time but gain 5′TG3′ dinucleotides.
The mechanism by which CpG islands are maintained in an unmethylated state is poorly understood. It is thought that the maintenance of the methyltransferase DNMT1 may play a role in the propagation of these epigenetic landmarks. The expression of certain genes within the human genome at least, would appear to be very tightly regulated by CpG island methylation.
Under normal circumstances, CpG island methylation occurs when genes are shut down irrevocably during development as may occur with, for example, certain genes on the inactive X-chromosome, for example phosphoglycerate kinase 1, and at imprinted genes for example insulin-like growth factor 2 receptor gene (Igf2r), H19. Unscheduled CpG island methylation may also occur as a result of disease and has been extensively documented in conditions such as cancer: for example, shut down of the RB genes causes Retinoblastoma, and silencing of the MLH1 gene causes increased mutability that promotes several tumour types. Although genes of this kind are usually lost through the occurrence of inactivating mutations, it is apparent that they can also be silenced by DNA methylation. Another example of aberrant CpG island methylation is the silencing of the FMRI gene in fragile X syndrome, which is the most common genetic form of mental retardation affecting males. In addition to these documented cases of CpG island methylation in disease, there have been suggestions that other common conditions (schizophrenia, arthritis, autoimmune diseases) might also have a similar basis.
The formation of a CpG island library may significantly facilitate research into the methylation patterns of these CpG islands in diseases such as those detailed above.
Previous attempts at generating such libraries have involved the use of a peptide derived from MeCP2 of Rattus rattus that is capable of binding methylated CpG dinucleotide pairs (Cross et al, 1994). Due to the fact that in most CpG islands a proportion of the CpG dinucleotides are not methylated, the use of these peptides was limited to the purification of methylated CpG dinucleotides or to the purification of CpG islands which had been artificially methylated prior to purification. These techniques lack sensitivity, accuracy and above all fail to generate a library of fragments that represent substantially all the CpG islands within, for example, the human genome.
It is among the objects of the present invention to obviate and/or mitigate at least one of the aforementioned disadvantages.
The present invention is based in part on the applicants' discovery that CpG island nucleic acid fragments may be isolated using a peptide which is capable of binding exclusively to non-methylated CpG dinucleotides.