1. Field of the Invention
This invention relates generally to a protein, more specifically to an endonuclease protein which has not previously been purified or characterized. This endonuclease is novel and extremely useful because it cleaves double-stranded DNA at specific, infrequent sites, for which endonucleases were not previously available. The resulting fragments are of great value for human gene mapping because the cleavage site is a sequence ordinarily encountered in genomic DNA, and because cleavage by the endonuclease produces relatively larger fragments than characteristic of those produced by many previously available endonucleases. This invention also includes methods for purifying the endonuclease and for cleaving DNA by use of the endonuclease.
2. Description of Related Art
(A) Restriction Endonucleases
One of the essential tools molecular biologists use to delve deeper into the mysteries of life contained in the structure of DNA, the genetic material, is a molecular scissors called a restriction endonuclease. There are many such enzymes which are capable of cutting DNA at specific sites (see Lewin, 1987 for review).
Restriction enzymes (restriction endonucleases) recognize specific short sequences of DNA (usually unmethylated DNA) and cleave the duplex molecule, usually at the target recognition site, but sometimes elsewhere. In some instances, the recognition site is specific, but the cleavage site is located some distance away from the recognition site and does not appear to be at any specific sequence.
"Duplex" refers to the double stranded composition of the DNA molecule. The cleavage induced by endonucleases is usually at specific sequences of approximately 4-6 base pairs. A base pair is a union of purines or pyrimidines in the DNA duplex. There are four such bases and they pair in specific unions: adenine with thymine, (A-T), guanine with cytosine (G-C).
Fragments generated by endonucleases are amenable for further analysis of their nucleotide composition. Variation in the fragment sizes obtained from the same chromosomal locations among individuals, is referred to as restriction fragment length polymorphism (RFLP).
Restriction endonucleases are essential components of methods used to construct maps of the genetic material, although not all such endonucleases are useful. Some of the problems limiting their use are that cleavage may be too frequent using a particular enzyme, producing pieces too small to be useful. Another problem is that the sites attacked may have nucleotide sequences that are so unusual that they are not likely to occur in vivo. Some enzymes only cleave artificially engineered sequences.
Restriction endonucleases are named by using three or four letter abbreviations identifying their origin, coupled with a letter and/or number designation which distinguish multiple enzymes of the same origin. An example of the nomenclature is EcoR1, one of the endonucleases derived from E. coli. Most of the endonucleases discovered initially were isolated from bacteria, in which they cleave DNA as part of the natural function of the cell. However, other organisms, for example, yeast, can be used as a source of double-strand DNA cleaving endonucleases.
Isolation of many endonucleases occurred because the bacteria from which the endonucleases were derived were able to distinguish between the DNA native to the bacteria and any invading foreign DNA. One of the ways bacteria recognized foreign DNA was by the absence of methyl groups at appropriate base pair sites. The bacteria protects its own DNA from cleavage by its own endonucleases, by methylation of its own DNA bases at appropriate target sites. Successful attack on bacteria by foreign DNA, for example by phage, may be due either to the fact that the phage DNA has the same pattern as the host DNA, or alternatively, that mutations have caused defects in the ability of the bacteria to produce an endonuclease or to attack the foreign DNA. Endonucleases isolated from bacteria are of two types, one which is only able to cleave DNA, and another in which both restriction and methylation activities are combined. Some restriction endonucleases introduce staggered cuts with overhangs others generate blunt ends.
(B) Restriction Mapping
Gene maps give the location of specific genes (specific DNA nucleotide sequences) that encode the primary sequences of protein gene products relative to each other and also localize the genes on specific chromosomes of higher organisms. A map of DNA obtained by using endonucleases to map breakpoints is called a restriction site map and consists of a linear sequence of cleavage sites. This physical map is obtained by extracting DNA from the chromosomes in cells, breaking the extracted DNA at various points with endonucleases, and determining the order of cleavage sites by analysis at the fragments.
Distances along the maps are measured directly in base pairs, or, if distances are long, in megabase pairs. By comparing the sequences of DNA between relatively short distances, a DNA map is constructed in a stepwise fashion. A major goal of current research is to construct a map of the entire human genome. (The Human Genome Project, American Society of Human Genetics Symposium, Baltimore, Nov. 15, 1989.) Success in mapping human and animal genomes will require a selection of endonucleases which cleave at a large variety of sites which occur in the DNA of living organisms, not just in artificial sequences.
DNA fragments produced by the action of endonucleases are separated on the basis of size by agarose or polyacrylamide gel electrophoresis. An electric current is passed through the gel, causing the fragments to move down it at a rate depending on length; the smaller fragments move more rapidly. The result of this migration in a gel, is a series of bands each corresponding to a fragment of a particular size. Many different endonucleases are used for gene mapping, and large numbers of overlapping fragments are analyzed. Sequential cleavage using different endonucleases produces a series of larger fragments broken down into smaller fragments. A hierarchy is then constructed based on the fact that there is complete additivity of length of the fragments within the original starting fragment. For example, a fragment of 2,100 base pairs may be broken down into 200 and 1900 base pairs. (see Lewin, 1987 for review).
Construction of an entire gene map for a species, for example construction of the human gene map, is a difficult and tedious task. The larger the number of endonucleases available for restriction mapping, the easier and more sophisticated the genetic map construction. In particular, many endonucleases are needed which cleave at a variety of specific sites and which produce fragments of different lengths. To appreciate the magnitude of the mapping problem it should be noted that an estimated 3 billion base pairs contained in 22 pairs of human chromosomes called autosomes plus two sex chromosomes, comprise the human genome.
Restriction maps represent advantages over older methods of mapping which identified a series of genetic sites because of the occurrence of DNA changes (mutations), because restriction maps can be obtained for any sequence of DNA. Their construction is not dependent upon the location of mutations, and no knowledge of the function of a particular sequence of DNA is required. However, restriction maps are related to, and are colinear with, "genetic" maps.
Mutations which are deletions or insertions of base pairs may be detected in restriction maps by noting an alteration of the length of a restriction fragment in which the mutation lies. Base-pair change type of mutations may be detected if their presence inactivates or creates a cleavage site of a particular endonuclease, altering the length of the restriction fragments produced by cleavage in the area of DNA in which the mutation lies.
(C) Restriction Fragment Length Polymorphisms (RFLP)
Different alleles (conditions of a gene) may lead to the production of different proteins and subsequent variation in the phenotype, (the detectable physical, biochemical, or physiologic makeup of the organism). Variation of DNA within populations is called genetic polymorphism. Even if the polymorphism does not lead to detectable changes in the phenotype by physical appearance or biochemical assays, genetic polymorphism may be detectable by variations in the DNA restriction fragment lengths (RFLP). Polymorphic variation in the restriction map therefore is independent of gene function.
RFLP's have numerous applications including as markers for paternity testing or determining the location of specific genes. For example, mutant genes responsible for inherited diseases such as Huntington's Chorea, a progressive neurological degeneration, have been localized to specific chromosomes in humans by correlating inheritance of RFLP's in families with the inheritance of the particular clinical condition. RFLP patterns of family members who are normal are compared with patterns of family members who are affected with a particular genetic disease.
(D) Other Uses for Restriction Endonucleases
Another use of restriction endonucleases is to create and use cloning vectors for the transmission of DNA sequences. For this purpose, the gene of interest needs to be attached to the vector fragment. One way this may be accomplished is by generating complementary DNA sequences on the vector and on the gene of interest so that they can be united (recombined). Some restriction endonucleases make staggered cuts which generate short, complementary, single stranded "sticky ends" of the DNA. An example of such an action is that effected by the EcoR1 endonuclease which cleaves each of the two strands of duplex DNA at a different point. These cleavage sites lie on either side of a short sequence that is part of the site recognized by the endonuclease. When two different DNA molecules are cleaved with EcoR1 the same sticky ends are generated which enables them to combine with each other. The DNA fragment can then be retrieved by cleaving the vector with EcoR1 to release the gene.
(E) Exons and Introns
The restriction map of DNA may not correspond directly with the coding sequence of messenger RNA produced by the DNA because DNA sequences of the total gene may consist of exons and introns. Exons are that part of the DNA code that appear in the messenger RNA. Most, but not all, exons code for proteins. Introns are DNA sequences that are usually spliced out of the RNA product before the messenger RNA proceeds to be translated into proteins. Splicing consists of a deletion of the intron from the primary RNA transcript and a joining or fusion of the ends of the remaining RNA on either side of the excised intron. Presence or absence of introns, the composition of introns, and number of introns per gene, may vary among strains of the same species, and among species having the same basic functional gene. Although in most cases, introns are assumed to be nonessential and benign, their categorization is not absolute. For example, an intron of one gene can represent an exon of another. A mosaic gene is defined as one which is expressed through the splicing together of exons carried by one molecule of RNA. In some cases, alternate or different patterns of splicing can generate different proteins from the same single stretch of DNA (Lewin, 1987).
(F) Mitochondrial DNA: Yeast Mitochondria
The DNA contained in mitochondria (cell organelles which contain extranuclear DNA) represents an example of differences arising during evolution between the composition of genes with regard to the exon and intron sequences as well as non-coding sequences. For example, comparing the mitochondrial genes of yeast with those in mammalian systems, indicates that identical mitochondrial proteins are produced despite the disparity in evolution between these species. The yeast mitochondrial genomes, however, are much larger than those occurring in vertebrates due to the absence of introns in the latter and the presence of non-coding spacer DNA in the former.
Primary DNA sequence data are known for many yeast isolates (see deZamaroczy and Bernardi, 1986) in which interstrain differences are due to (i) a small number of large deletions/additions, mainly concerning introns; (ii) a large number of small (10-150 bp) deletions/additions located in the intergenic sequences; (iii) 1-3 bp deletions/additions and point mutations. In Saccharomyces cerevesiae the size of mitochondrial DNA can range up to about 84 kilobases, approximately 2/3 of which is non-coding regions. There are more than 20 mitochondria per cell, i.e., approximately 4 genomes per mitochondrion. In comparison, vertebrate mitochondrial DNA is approximately 17 kilobase pairs. In the individual mitochondrion there are usually several copies of a single molecule of DNA. Moreover, there are multiple mitochondria per cell. In plants and some unicellular eukaryotes, extranuclear DNA is found in chloroplasts and mitochondria.
The DNA within the mitochondria directs protein synthesis, just as nuclear DNA does. However, a finite number of proteins are produced. There are general similarities in the machinery for gene expression in mitochondria of various species, rendering products and information derived in one genus applicable to many others. For some products, the cytochrome c oxidase in yeast, for example, protein synthesis combines factors produced in the cytoplasm encoded by nuclear DNA with those synthesized directly in the mitochondria. Mutations identifying almost all the mitochrondrial genes have been detected. Nuclear mutations that interact with, or abolish, the effects of these mutational complexes in the mitochondria, have also been found. Genes coding for many of the same functions are present both in the yeast and the mammalian mitochondrial genome, making the yeast mitochondria a good model system for testing theories on gene expression in higher organisms. The mitochondrial genome of one of the yeast strains, Saccharomyces cerevisiae has provided both information on genetic expression, as well as products which can be useful for analysis of higher systems. (Lewin, 1987; Butow 1985).
In the invention described herein, a new and unique restriction endonuclease has been isolated and purified. One obstacle to purification and characterization of this enzyme in the past has been the inability to accumulate sufficient amounts of the protein, a problem which has been solved by methods disclosed in this invention. A preferred source of the endonuclease described in this invention is yeast mitochondria from a special strain. The endonuclease has wide applications for in vivo or in vitro cleavage of double-stranded DNA from many genera.