I. Field of the Invention
The present invention relates generally to the field of molecular biology. More particularly, it concerns methods for isolating centromere DNA.
II. Description of Related Art
It is well documented that centromere function is crucial for stable chromosomal inheritance in almost all eukaryotic organisms, including essentially all plants (reviewed in Nicklas 1988) or animals. For example, broken chromosomes that lack a centromere (acentric chromosomes) are rapidly lost from cell lines, while fragments that have a centromere are faithfully segregated. The centromere accomplishes this by attaching, via centromere-associated proteins, to the spindle fibers during mitosis and meiosis, thus ensuring proper gene segregation during cell divisions.
To date, the most extensive and reliable characterization of centromere sequences has come from studies of lower eukaryotes such as S. cerevisiae and S. pombe, where the ability to analyze centromere functions has provided a clear picture of the desired DNA sequences. None of the essential components identified in unicellular organisms, however, function in higher eukaryotic systems. This has seriously hampered efforts to produce artificial chromosomes in higher organisms.
Genetic characterization of centromeres has relied primarily on segregation analysis of chromosome fragments, and in particular on analysis of trisomic strains that carry a genetically marked, telocentric fragment (for example, see Koornneef 1983). This approach is imprecise, however, because a limited set of fragments can be obtained, and because normal centromere function is influenced by surrounding chromosomal sequences (for example, see Koornneef, 1983).
A more precise method for mapping centromeres that can be used in intact chromosomes is tetrad analysis (Mortimer et al., 1981), which provides a functional definition of a centromere in its native chromosomal context. However, the technique is currently limited to a small number of organisms and is relatively labor intensive (Preuss 1994, Smyth 1994). To date, among higher plants, the technique has only been used successfully in Arabidopsis (Copenhaver, 1999).
Another avenue of investigation of centromeres has been study of the proteins that are associated with centromeres (Bloom 1993; Earnshaw 1991). Human autoantibodies that bind specifically in the vicinity of the centromere have facilitated the cloning of centromere-associated proteins (CENPs, Rattner 1991). Yeast centromere-associated proteins also have been identified, both through genetic and biochemical studies (Bloom 1993; Lechner et al., 1991).
Despite the aforementioned methods of analysis, the centromeres of most organisms remain poorly defined. Although repetitive DNA fragments mapping both cytologically and genetically to centromeric regions in plants and other higher eukaryotes have been identified, little is known regarding the functionality of these sequences (see Richards et al., 1991; Alfenito et al., 1993; and Maluszynska et al., 1991). Many of these sequences are tandemly-repeated satellite elements and dispersed repeated sequences in series of repeats ranging from 300 kB to 5000 kB in length (Willard 1990). Whether repeats themselves represent functional centromeres remains controversial, as other genomic DNA is required to confer inheritance upon a region of DNA (Willard, 1997).
One characteristic of centromeres which is not well understood is the methylation of cytosines at the carbon 5 position (Martinez-Zapater et al., 1986; Maluszynska and Heslop-Harrison, 1991; Vongs et al., 1993). Methylation is a characteristic feature of many eukaryotic genomes and has been shown to be correlated with heterochromatic regions including regions of repetitive DNA and centromeres (Martienssen and Richards, 1995; Ng and Bird, 1999).
The genomes of both animals and plants contain cytosine methylation, with overall levels of CpG modification often reaching 60 to 90% (Jones and Wolffe, 1999; Gruenbaum et al., 1981). In euchromatin, DNA methylation is concentrated in small regions such as CpG islands and provides epigenetic modifications that regulate genome imprinting, gene expression, and DNA repair (Robertson and Jones, 2000; Singer et al., 2001). In contrast, the role of the extensive DNA methylation found in repetitive, heterochromatic portions of the genome is less clear. In some cases, this methylation reduces recombination; in others, it may play a structural role (J. Bender, 1998; Vongs et al., 1993; Yoder et al., 1997).
A means that has been utilized to study the distribution of methylation in genomes is the use of methylation sensitive restriction endonucleases either alone or in combination with isoschizomeric restriction endonucleases lacking sensitivity to methylation (Jeddeloh and Richards, 1996). An example of such an isoschizomeric pair is HpaII and MspI, which both cut the sequence 5xe2x80x2-C/CGG-3xe2x80x2, but each enzyme differs in its sensitivity to cytosine methylation (Butkus et al., 1987; McClelland et al., 1994). Such analyses involving methylation have often been directed to the sparsely methylated portion of genomes, which comprises the majority of coding sequences.
While the above studies have been useful in helping to elucidate the structure and function of centromeres, they have failed to provide an efficient method for cloning centromere nucleic acid sequences. The development of such methods could allow the isolation of centromeres from a broad variety of organisms, potentially allowing the creation of artificial chromosome vectors tailored to numerous economically important species. Such a technique would avoid the need for costly methodologies described by the prior art and represent a significant advance in biotechnology research.
In one aspect of the invention, a method is provided for obtaining a centromere nucleic acid sequence from a selected species. The method may comprise the steps of: a) preparing a first sample of genomic DNA from a selected species; b) obtaining a plurality of methylated nucleic acid segments from the genomic DNA; and c) screening the methylated nucleic acid segments to identify a centromere nucleic acid sequence. In the method, obtaining may comprise any method of preparing a collection of methylated nucleic acid segments, including contacting genomic DNA with a methylation sensitive nuclease and selecting nucleic acid segments exhibiting resistance to cleavage with the methylation sensitive restriction endonuclease to obtain the plurality of methylated nucleic acid segments. Obtaining methylated DNA may also comprise use of an antibody specific to methylated DNA, for example, by immunoprecipitating methylated nucleic acid segments with an antibody capable of specifically binding methylated DNA or associated proteins.
In another aspect of the invention, the method for obtaining a centromere nucleic acid sequence from a selected species may be further defined as comprising labeling at least a first methylated nucleic segment from a plurality of methylated nucleic acid segments, hybridizing the first methylated nucleic segment to a clone comprising genomic DNA of a selected species and detecting the labeling to obtain a clone comprising a centromere nucleic acid sequence. In the method for obtaining a centromere nucleic acid sequence from a selected species, screening may comprise using an array, for example, in a method comprising the steps of: (i) obtaining an array comprising cloned genomic DNA from the selected species; (ii) detecting a candidate centromere nucleic acid sequence from the cloned genomic DNA of the array, where the candidate centromere nucleic acid sequence comprises a nucleic acid sequence complementary to a nucleic acid sequence of at least a first member of the plurality of methylated nucleic acid segments; and (iii) identifying a centromere nucleic acid sequence from the candidate centromere sequence.
In yet another aspect of the invention, the method for obtaining a centromere nucleic acid sequence from a selected species may comprise detecting a plurality of candidate centromere nucleic acid sequences from an array, where the candidate centromere nucleic acid sequences comprise nucleic acid sequences complementary a nucleic acid sequence of at least a first member of the plurality of methylated nucleic acid segments. An array used with the invention may comprise potentially any target nucleic acid sequences, including cloned genomic DNA. The array may also comprise nucleic acids attached to a solid support. In one embodiment of the invention, the array may comprise cloned genomic DNA attached to a solid support in any selected pattern, including a grid. The cloned genomic DNA may be from any type of clone, including a bacterial artificial chromosome or yeast artificial chromosome clone. Potentially any suitable solid support may be used with the array, including, a microscope slide or hybridization filter.
Detecting nucleic acids in accordance with the invention may comprise use of any suitable label. For example, in the method of obtaining a centromere nucleic acid sequence, the detecting may comprise fluorescently labeling a plurality of methylated nucleic acid segments and hybridizing the labeled plurality of methylated nucleic acid segments to an array. Alternatively, detecting may comprise labeling the plurality of methylated nucleic acid segments with an antigen, hybridizing the labeled plurality of methylated nucleic acid segments to an array and detecting the antigen with a molecule which binds the antigen. Labeling probes may comprise radioactively labeling a plurality of methylated nucleic acid segments and hybridizing the labeled plurality of methylated nucleic acid segments to an array. An array used with the invention may comprise a plurality of DNA pools, the pools comprising the nucleic acid sequences of at least a first and a second clone comprising genomic DNA from a selected species.
In still yet another aspect of the invention, methylated nucleic acid segments may be obtained by a method comprising (i) obtaining a second sample of genomic DNA from a selected species; (ii) contacting the second sample of genomic DNA with an isoschizomer of a methylation sensitive restriction endonuclease, wherein the isoschizomer is not methylation sensitive; (iii) resolving separately first and second samples of genomic DNA following the contacting with the isoschizomer and the methylation sensitive restriction endonuclease; and (iv) selecting a plurality of methylated nucleic acid segments from at least a first nucleic acid fraction present in the first sample of genomic DNA and not present in the second sample of genomic DNA. The method may further comprise contacting the second sample of genomic DNA with a methylation sensitive restriction endonuclease. Any methylation sensitive restriction endonuclease may potentially be used with the invention, including, for example, AatII, Acil, Agel, Ahall, Ascl, AvaI, BsaAI, BsaHI, BsiEI, BsiWI, BspDI, BsrFI, BssHII, BstBI, BstUI, Cfr10I, ClaI, EagI, Eco47III, Esp3I, FseI, FspI, HaeII, HgaI, HhaI, HinPlI, HpaII, KasI, MluI, NaeI, NarI, NgoMIV, NotI, NruI, PmlI, Psp1406I, PvuI, RsrII, SacII, SalI, SmaI, SnaBI, TaiI, and XhoI. Alternatively, a non-methylation sensitive restriction endonuclease may be used with the invention, including, for example, BamHI, BanII, BbsI, BsaJI, BsaWI, BsmI, Bspl286I, BspEI, BspMI, BsrBI, BstEII, BstYI, Csp6I, Eaml105I, EarI, Eco0I09I, EcoRI, EcoRV, FokI, HaeIII, HgiAI, HphI, KpnI, MspI, PaeR7I, PmeI, SacI, SfaNI, SphI, TaqI, TfiI, Tth111I, and XmaI.
In still yet another aspect of the invention, methylated nucleic acid segments may be obtained by a method comprising determining the resistance of the methylated nucleic acid segments to restriction based on the length of the methylated nucleic acid segments following contacting with a methylation sensitive restriction endonuclease. In the method, the average length of the plurality of methylated nucleic acid segments may be at least 3 kb, 4 kb, 5 kb, 7 kb, 8 kb, or at least 10 kb, or another length determined to represent the fraction of methylated nucleic acid segments.
In still yet another aspect, the method of obtaining a centromere nucleic acid sequence from a selected species may be further defined as comprising obtaining a plurality of unmethylated nucleic acid segments and comparing the plurality of unmethylated nucleic acid segments to a plurality of methylated nucleic acid segments to identify at least a first methylated nucleic acid segment present in the plurality of methylated nucleic acid segments and not present in the plurality of unmethylated nucleic acid segments. The method may be further defined as comprising hybridizing a plurality of unmethylated nucleic acid segments to one or both of a first methylated nucleic acid segment or a clone comprising genomic DNA of a selected species, wherein the plurality of unmethylated nucleic acid segments have not received labeling. In the method, obtaining a plurality of unmethylated nucleic acid segments may comprise identifying a plurality of nucleic acid sequences which are susceptible to restriction with a methylation sensitive restriction endonuclease. The method may be further defined as comprising measuring an average length of the plurality of unmethylated nucleic acid segments following restriction with the methylation sensitive restriction endonuclease. In certain embodiments of the invention, the average length of the plurality of unmethylated nucleic acid segments may be less than about 5 kb, 4 kb, 3 kb, 2 kb or about 1 kb or smaller following restriction with the methylation sensitive restriction endonuclease.
In still yet another aspect of the invention, in the method for obtaining a centromere nucleic acid sequence from a selected species, the selected species may be further defined as a plant, including a dicotyledonous plant or a mammal, such as a human. Examples of dicotyledonous plants include tobacco, tomato, potato, sugar beet, pea, carrot, cauliflower, broccoli, soybean, canola, sunflower, alfalfa, cotton and Arabidopsis. In certain further embodiments, the dicotyledonous plant is not Arabidopsis. The plant may also be a monocotyledonous plant, including wheat, maize, rye, rice, turfgrass, oat, barley, sorghum, millet, and sugarcane.
In still yet another aspect of the invention, the method for obtaining a centromere nucleic acid sequence from a selected species may comprise screening to identify a candidate centromere sequence not comprising repetitive DNA.
In still yet another aspect of the invention, the step of contacting in the method for obtaining a centromere nucleic acid sequence from a selected species may comprise: (i) incubating the genomic DNA with a methylation sensitive restriction endonuclease to digest unmethylated DNA; (ii) resolving digested genomic DNA from undigested genomic DNA by electrophoresis; and (iii) isolating a plurality of methylated nucleic acid segments from the undigested genomic DNA. In the method, the average length of the plurality of methylated nucleic acid segments may be at least about 3 kb, 4 kb, 5 kb 7kb, 8 kb, or at least 10 kb, or another length determined to represent the fraction of methylated nucleic acid segments.
In still yet another aspect of the invention, the method for obtaining a centromere nucleic acid sequence from a selected species may comprise fluorescent in situ hybridization of at least a first methylated nucleic acid segment from the plurality of methylated nucleic acid segments. The method may also comprise determining the nucleic acid sequence of at least a first methylated nucleic acid segment from the plurality of methylated nucleic acid segments. The method may still further comprise comparing the nucleic acid sequence of the first methylated nucleic acid segment to a known centromere sequence. In another embodiment of the invention, comparing may comprise immunoprecipitating a centromere nucleic acid sequence and comparing the sequence to the nucleic acid sequence of the first methylated nucleic acid segment. This may comprise immunoprecipitating the centromere nucleic acid sequences with an antibody capable of binding methylated DNA. Alternatively, this may comprise immunoprecipitating the centromere nucleic acid sequences with an antibody capable of binding a centromere-associated protein.
In still yet another aspect of the invention, the method for obtaining a centromere nucleic acid sequence from a selected species may comprise genetically mapping at least a first methylated nucleic acid segment from the plurality of methylated nucleic acid segments.
In still yet another aspect of the invention, the method for obtaining a centromere nucleic acid sequence from a selected species may comprise determining the extent of acetylation of at least a first histone bound to at least a first methylated nucleic acid segment from the plurality of methylated nucleic acid segments.
In still yet another aspect of the invention, the method for obtaining a centromere nucleic acid sequence from a selected species may comprise transforming a cell with at least a first methylated nucleic acid segment from the plurality of methylated nucleic acid segments. The cell may be transformed with the methylated nucleic acid segment. The cell may be further defined as integratively or non-integratively transformed with the methylated nucleic acid segment. The nucleic acid segment may or may not be methylated when it is transformed in the organism and may still further be defined as remethylated. Screening may comprise observing a phenotypic effect present in the integratively transformed cells or whole organisms comprising the cells, wherein the phenotypic effect is absent in a control cell not integratively transformed with the methylated nucleic acid segment, or an organism comprising the control cell. The phenotypic effect may be selected from the group consisting of reduced viability, reduced efficiency of transforming, genetic instability in the integratively transformed nucleic acid, aberrant tissue sectors, increased ploidy, aneuploidy, and increased integrative transformation in distal or centromeric chromosome regions.
In still yet another aspect of the invention, in the method for obtaining a centromere nucleic acid sequence from a selected species, a first methylated nucleic acid segment may be further defined as comprising a recombinant construct. The recombinant construct may comprise any additional selected elements, including an autonomous replicating sequence (ARS), a structural gene, and a selectable or screenable marker gene.
In still yet another aspect of the invention, a centromere nucleic acid sequence is provided which has been prepared by a method for obtaining a centromere nucleic acid sequence from a selected species in accordance with the invention. Further provided by the invention, is an organism or cell transformed in accordance with the invention, as well as a progeny of any generation of such an organism, the organism comprising the first methylated nucleic acid segment.
In still yet another aspect of the invention, a method of obtaining a centromere nucleic acid sequence from a selected organism is provided, the method comprising the steps of: a) preparing a first sample of genomic DNA from a selected organism; b) contacting said genomic DNA with a strand-specific methylation sensitive restriction endonuclease; c) nick-translating the genomic DNA; and c) detecting a centromere nucleic acid sequence that hybridizes to the nick-translated genomic DNA. In one embodiment of the invention, the strand-specific methylation sensitive restriction endonuclease is selected from the group consisting of HpaI, KpnI, MaeII, and Sau3A.
The method of detecting may comprise screening an array. Use of such an array may comprise the steps of a) obtaining an array comprising cloned genomic DNA from said selected organism; and b) detecting a centromere nucleic acid sequence from said cloned genomic DNA of said array by hybridizing the nick translated genomic DNA to said array. In one embodiment of the invention, a plurality of centromere nucleic acid sequences are detected from said array. The array may comprise the cloned genomic DNA attached to a solid support. The array may or may not comprising the cloned genomic DNA attached in a selected pattern, such as a grid. Any cloned genomic DNA could be used, such as from a bacterial artificial chromosome or yeast artificial chromosome clone. Any solid support can be used, such as a microscope slide or hybridization filter. In one embodiment of the invention, the array comprises a plurality of DNA pools, the pools comprising the nucleic acid sequences of at least a first and a second clone comprising genomic DNA from a selected organism.
Contacting may, in certain embodiments of the invention, be further defined as comprising a) obtaining a second sample of genomic DNA from said selected organism; b) contacting said second sample of genomic DNA with an isoschizomer of said strand-specific methylation sensitive restriction endonuclease, wherein said isoschizomer is not a strand-specific methylation sensitive restriction endonuclease; c) resolving separately said first and said second samples of genomic DNA following said contacting; and d) selecting a plurality of hemimethylated nucleic acid segments from at least a first nucleic acid fraction present in said first sample of genomic DNA and not present in said second sample of genomic DNA. Any suitable labeling can be used with the nick-translating, including use of radioactive labeling, labeling the genomic DNA with an antigen and labeling the genomic DNA with a fluorophore.
In certain embodiments of the invention, the selected organisms used with the method is a plant. The plant may be a dicotyledonous plant, including tobacco, tomato, potato, sugar beet, pea, carrot, cauliflower, broccoli, soybean, canola, sunflower, alfalfa, cotton and Arabidopsis. The plant can also be a monocotyledonous plant, including wheat, maize, rye, rice, turfgrass, oat, barley, sorghum, millet, and sugarcane. Alternatively, the selected organism is a mammal, including a human.
In certain embodiments of the invention, the method is further defined as comprising fluorescent in situ hybridization of the centromere nucleic acid sequence, and may also comprise determining the nucleic acid sequence of the centromere nucleic acid sequence. In further embodiments, the method comprises comparing the nucleic acid sequence of the centromere nucleic acid sequence to a known centromere sequence. In still further embodiments, the method comprises transforming a cell with the centromere nucleic acid sequence, either integratively or non-integratively, with the centromere nucleic acid sequence. The method may also comprise screening for a phenotypic effect present in the integratively transformed cells or an organism comprising the cells, wherein said phenotypic effect is absent in a control cell not integratively transformed with said centromere nucleic acid sequence or an organism comprising said control cell. Examples of phenotypic effects that could be screened include reduced viability, reduced efficiency of said transforming, genetic instability in the integratively transformed nucleic acid, aberrant tissue sectors, increased ploidy, aneuploidy, and increased integrative transformation in distal or centromeric chromosome regions.
The centromere nucleic acid sequence can be transformed alone, or may be on a recombinant construct, including fragments thereof. The centromere nucleic acid sequence may also be further defined as comprising cloned DNA. The cloned DNA may or may not be methylated, for example, because methylation may be lost following cloning. The cloned DNA may also be remethylated prior to transforming, and may also be defined as hemimethylated. The recombinant DNA may or may not include any other desired elements, including one or more telomere, an autonomous replicating sequence (ARS), structural gene, and selectable or screenable marker gene.
In still yet another aspect, the invention provides a centromere nucleic acid sequence prepared by any of the foregoing methods. Also proveded are a non-human organism prepared by such methods, as well as a progeny of any generation of such an organism.