The present invention relates to the field of analysis and manipulation of chromosomal DNA in situ. In particular, the invention provides a novel cytosine methyltransferase gene and encoded enzyme that recognizes the dinucleotide GpC, and its use in high resolution analysis and manipulation of protein-DNA interactions in chromatin.
Various publications or patents are referenced in this application to describe the state of the art to which the invention pertains. Each of these publications or patents is incorporated by reference herein.
In vivo methylation of DNA has been used successfully to study protein-DNA interactions in the chromatin of living cells. A high frequency of methyltransferase targets is critical for high resolution mapping of chromatin structure. Among currently available methyltransferase probes, the only de novo dinucleotide methyltransferase is M.SssI, which recognizes a CpG site (Renbaum, P., Abrahamove, D., Fainsod, A., Wilson, G., Rottem, S. and Razin, A. (1990) Nucleic Acids Res., 18, 1145-1152). Due to under-representation of the CpG dinucleotide in the genome, the resolution of chromatin structure maps using this enzyme is about 35 base pairs on average in S. cerevisiae (Dujon, B., Alexandrakl, D., Andrxc3xa9, B., Ansorge, W., Baladron, V., Ballesta, J. P. G., Banrevl, A., Bolle, P. A., Bolotin-Fukuhara, M., Bossler, P. et al). (1994) Nature, 369, 371-378.). With this moderate level of resolution, M.SssI can possibly serve to detect the presence of a positioned nucleosome, 146 bp in yeast, without the need for introduction of additional CpG sites into native DNA sequences. However, this resolution is insufficient for mapping the interactions of non-histone regulatory proteins, since the typical length of the target DNA sequence of most regulatory proteins is xcx9c20-30 base pairs or less. For example, the yeast TATA box binding protein (TBP) recognizes and binds to an 8 bp sequence (Kim, Y., Geiger, J. H., Hahn, S. and Sigler, P. B. (1993) Nature, 365, 512-520.), while the well-characterized transcriptional activator Gal4p binds to a 17 bp consensus sequence (Giniger, E., Varnum, S. M. and Ptashne, M. (1985) Cell, 40, 767-774.). Furthermore, methylation of CpG islands has been implicated as an important controlling element for gene regulation in mammalian systems, which may limit the application of M.SssI in higher organisms (Tazi, J. and Bird, A. (1990) Cell, 60, 909-920.). To address both the limitation of resolution and the possible inability to utilize M.SssI in higher organisms, cloning and expression of cytosine-5-DNA methyltransferases (5-meC MTase) with different specificities but similarly small recognition sites is essential.
A family of double-stranded DNA viruses that infect certain unicellular, eukaryotic, Chlorella-like green algae are reported to be a rich source of restriction/ modification systems (Nelson, M., Zhang, Y. and Van Etten, J. L. (1993) DNA Methylation: Molecular Biology and Biological Significance. Birkhauser-Verlag Press, Basel, Switzerland, pp. 186-211; Nelson, M., Burbank, D. E. and Van Etten, J. L. (1998) Biological Chem. 379, 423-428). Among the 37 viruses infecting Chlorella NC64A and the five viruses infecting Chlorella Pbi which have been partially characterized, 39 viral DNAs contain 5-methylcytosine, ranging in concentration from 0.1 to 47% of total cytosine (Nelson and Van Etten, 1993, supra; Nelson and Van Etten, 1998, supra)
One cytosine methyltransferase, M.CviJI, has been cloned from Chlorella virus IL-3A and shown to recognize the nucleotide sequence RGC(T/C/G) (Shields, S. L., Burbank, D. E., Grabherr, R. and Van Etten, J. L. (1990) Virology, 176, 16-24). As determined by the resistance/sensitivity of the viral DNAs to over 70 methylation-sensitive restriction endonucleases, at least five independent 5-meC modification systems are predicted to be encoded by some of the more highly modified viruses, including methyltransferases thought to recognize Cpc and RpCpY (Nelson and Van Etten, 1993, supra;Nelson and Van Etten, 1998, supra) . Based on the composition of the yeast genome as an example, on average, one Cpc site per 13.9 bp and one RpCpY site per 10.7 bp can be expected in the genome. Achieving this level of resolution would allow mapping the interactions of most non-histone, regulatory proteins. The cloning of methyltransferases from Chlorella viruses could greatly extend the resolution of chromatin mapping as well as allow extension of in vivo chromatin mapping to higher organisms.
The present invention provides a novel cytosine-5-DNA methyltransferase gene and its encoded enzyme, isolated from Chlorella virus NYs-1, that recognizes the sequence GpC. This methyltransferase having a small recognition site that occurs with a high frequency in eukaryotic genomes is of particular utility for high resolution analysis of chromatin structure and protein-DNA interactions in living cells.
According to one aspect of the invention, an isolated nucleic acid molecule is provided, which encodes a cytosine-5 DNA methyltransferase that recognizes a GpC dinucleotide in DNA. Preferably, the nucleic acid molecule is isolated from a Chlorella virus, most preferably from Chlorella virus NYs-1. In a preferred embodiment, the encoded cytosine-5 DNA methyltransferase has an amino acid sequence substantially the same as SEQ ID NO:2 and the encoded methyltransferase is catalytically active and recognizes the GpC dinucleotide. Most preferably, the encoded cytosine-5 DNA methyltransferase has amino acid SEQ ID NO:2.
The following isolated nucleic acid sequences are provided in the present invention: (a) SEQ ID NO:1; (b) natural variants of SEQ ID NO:1; (c) sequences that hybridize with part or all of an antisense strand of SEQ ID NO:1 and encode part or all of a protein having a catalytic activity and sequence recognition specificity the same as the protein having SEQ ID NO:2; and (d) a sequence encoding part or all of SEQ ID NO:2.
According to another aspect of the invention, a recombinant DNA molecule comprising one of the aforementioned cytosine methyltransferase-encoding nucleic acid molecules inserted into a vector for transforming cells, is provided. The recombinant DNA molecule is used to transform cells, which may be cultured cells or which may be cells of a living organism. Oligonucleotides of between about 10 to about 100 nucleotides in length which hybridize with portions of the methyltransferase-encoding nucleic acid molecule are also provided in accordance with the present invention, as are antibodies immunologically specific for part or all of the encoded polypeptide.
According to another aspect of the invention, an isolated cytosine-5 DNA methyltransferase that specifically recognizes a GpC dinucleotide sequence in DNA is provided. The methyltransferase preferably is isolated from a Chlorella virus, most preferably virus NYs-1. In a preferred embodiment, the cytosine-5 DNA methyltransferase has an amino acid sequence substantially the same as SEQ ID NO:2 and the methyltransferase is catalytically active and recognizes the GpC dinucleotide. In a particularly preferred embodiment, the enzyme has amino acid SEQ ID NO:2.
According to another aspect of the invention methods of mapping DNA-protein interactions with the novel cytosine methyltransferase are provided. One method comprises the steps of: (a) providing a sample of the cells transformed with a nucleic acid molecule that encodes the novel cytosine-5 methyltransferase; (b) growing a test culture of the transformed cells under conditions enabling production of the methyltransferase; (c) growing a control culture of equivalent cells that do not produce the methyltransferase; (d) isolating the DNA from the test culture and the control culture; (e) exposing the DNA from the control culture to the cytosine-5 methyltransferase; and (f) comparing the cytosine methylation of the DNA from the test culture with the cytosine methylation of the DNA from the control culture, a decrease in extent of methylation in the DNA of the test culture being proportional to the amount of DNA-protein interaction occurring in the DNA in the cell. The method may further include comparing a pattern of methylation in a selected region of the DNA from the test culture and the control culture, a change in the methylation pattern in the respective DNA being indicative of a location of a DNA-protein interaction in the DNA of the cell. In one embodiment, the aforementioned method is applied to analyzing interactions between nucleosome proteins and chromosomal DNA. In another embodiment, the method is applied to analyzing an interaction between a transcriptional regulatory protein and a transcriptional response element in the DNA. These methods are used to advantage in the high resolution mapping of sites of interest for in situ genetic manipulation, such as insertion of a foreign gene for gene therapy.