A few decades ago, genes were thought to exist as uninterrupted DNA units transcribed into corresponding mRNA sequences which are then translated by ribosomes to give proteins which serve a specific function in a cell.
One of the essential tools molecular biologists use to delve deeper into the mysteries of life contained in the structure of DNA, the genetic material, are molecular scissors called restriction endonuclease. There are many such enzymes which are capable of cutting DNA at specific sites.
Restriction enzymes (restriction endonucleases) recognize specific short sequences of DNA (usually unmethylated DNA) and cleave the duplex molecule, usually at the target recognition site, but sometimes elsewhere. In some instances, the recognition site is specific, but the cleavage site is located some distance away from the recognition site and does not appear to be at any specific sequence.
"Duplex" refers to the double stranded composition of the DNA molecule. The cleavage induced by endonucleases is usually at specific sequences of approximately 4 to 6 base pairs. A base pair is a union of purines of pyrimidines in the DNA duplex. There are four such bases and they pair in specific unions: adenine with thymine, (A-T), guanine with cytosine (G-C).
Restriction endonucleases are named by using three or four letter abbreviations identifying their origin, coupled with a letter and/or number designation which distinguish multiple enzymes of the same origin. An example of the nomenclature is EcoRI, one of the endonucleases derived from E. coli. Most of the endonucleases discovered initially were isolated from bacteria, in which they cleave DNA as part of the natural function of the cell. However, other organisms, for example, yeast, can be used as a source of double-strand DNA cleaving endonucleases.
Isolation of many endonucleases occurred because the bacteria from which the endonucleases were derived were able to distinguish between the DNA native to the bacteria and any invading foreign DNA. One of the ways bacteria recognize foreign DNA is by the absence of methyl groups at appropriate base pair sites. The bacteria protects its own DNA from cleavage by its own endonucleases, by methylation of its own DNA bases at appropriate target sites. Successful attack on bacteria by foreign DNA, for example by bacteriophages, may be due either to the fact that the phage DNA has the same methylation pattern as that of the host DNA, or alternatively, that mutations have caused defects in the ability of the bacteria to produce an endonuclease or to attack the foreign DNA. Endonucleases isolated from bacteria are of two types, one which is only able to cleave DNA, and another in which both restriction and methylation activities are combined. Some restriction endonucleases introduce staggered cuts with overhangs while others generate blunt ends.
Restriction endonucleases recognize unique sequences of generally 4 and 6 nucleotides in double-stranded DNA molecules and cleave only at or near these sites. Many of the known restriction enzymes recognize a palindromic sequence which bears a dyad (twofold) symmetry. Cleavage which occurs on both strands at the axis of symmetry will generate blunt-ended fragments. For example, the restriction endonuclease HpaI of the bacterium Haemophilus parainfluenzae recognizes a specific sequence and cleaves at the points designated by an arrow. ##STR1##
Thereby generating blunt-ended fragments.
Other restriction enzymes, such as Eco RI and Pst I will cleave both strands at similar positions on opposite sides of the axis and generate four nucleotide extensions that end respectively with a 5' phosphate and 3' hydroxyl group (Table 1).
TABLE 1 ______________________________________ Specificity of some restriction endonucleases ______________________________________ Producing flush ends ##STR2## ##STR3## Producing staggered ends ##STR4## ##STR5## ##STR6## ______________________________________ The dot indicates the axis of twofold rotational symmetry, and the arrow indicate the site of cleavage. The asterisks show the methylation sites (where known) in the parent organism, which is Haemophilus influenzae for Hind II and Hind III, E. coli for Eco RI, Providencia stuartii for Pst I, and Haemophilus parainfluenzae for Hpa I. Pu = purine, Py = pyrimidine, and N = A or T.
Together with other recent developments, restriction endonucleases have made it possible to recombine genes from one organism into the genome of another. Another use of restriction endonucleases is to create and use cloning vectors for the transmission of DNA sequences. For this purpose, the gene of interest needs to be attached to the vector fragment. One way this may be accomplished is by generating complementary DNA sequences on the vector and on the gene of interest so that they can be united (recombined). Some restriction endonucleases make staggered cuts which generate short, complementary, single stranded "sticky ends" of the DNA. An example of such an action is that effected by the EcoRI endonuclease which cleaves each of the two strands of duplex DNA at a different point.
These cleavage sites lie on either side of a short sequence that is part of the site recognized by the endonuclease. When two different DNA molecules are cleaved with EcoRI the same sticky ends are generated which enables them to combine with each other. The DNA fragment can then be retrieved by cleaving the vector with EcoRI to release the gene.
Fragments generated by endonucleases are amenable for further analysis of their nucleotide composition. Variation in the fragment sizes obtained from the same chromosomal locations among individuals, is referred to as restriction fragment length polymorphism (RFLP).
Gene maps give the location of specific genes (specific DNA nucleotide sequences) that encode the primary sequences of protein gene products relative to each other and also localize the genes on specific chromosomes of higher organisms. A map of DNA obtained by using endonucleases to map breakpoints is called a restriction map and consists of a linear sequence of restriction sites. This physical map is obtained by extracting chromosomal DNA from the chromosomes in cells, breaking the extracted DNA at various points with endonucleases, and determining the order of restriction sites by analysis of the fragments.
Distances along the maps are measured directly in base pairs or, if distances are long, in megabase pairs. By comparing the sequences of DNA between relatively short distances, a DNA map is constructed in a stepwise fashion. A major goal of current research is to construct a map of the entire human genome. (The Human Genome Project, American Society of Human Genetics Symposium, Baltimore, Nov. 15, 1989.) Success in mapping human and animal genomes will require a selection of endonucleases which cleave at a large variety of sites which occur in the DNA of living organisms, not just in artificial sequences.
DNA fragments produced by the action of endonucleases are separated on the basis of size by agarose or polyacrylamide gel electrophoresis. An electric current is passed through the gel, causing the fragments to move down it at a rate depending on length; the smaller fragments move more rapidly. The result of this migration in a gel, is a series of bands each corresponding to a fragment of a particular size. Many different endonucleases are used for gene mapping, and large numbers of overlapping fragments are analyzed. Sequential cleavage using different endonucleases produces a series of larger fragments broken down into smaller fragments. A hierarchy is then constructed based on the fact that there is complete additivity of length of the fragments within the original starting fragment. For example, a fragment of 2,100 base pairs may be broken down into 200 and 1900 base pairs.
The establishment of restriction maps for genomes of several species revealed the existence of physically localized restriction fragment length polymorphisms (RFLP) that are used as physical markers to study recombination between genomes at the molecular level.
For the past years, studies that have focused on the inheritance of several genes have revealed that particular markers in these genes were inherited unidirectionally by the progeny from interspecific crosses. Comparative sequence analysis from both parents indicate that some of these markers are located within intervening sequences called introns that are usually found in either coding or non-coding sequences of a gene. These introns are then removed from the pre-mRNA transcripts by a process called "splicing". Sequences (exon) on each side of the intron are then brought together to form a mature m-RNA transcript. These introns belong to the group I family and contain internal open reading frames (ORF) which encode for endonucleases. These endonucleases generate a doublestrand cut at or near the site of intron insertion within the cognate allele and initiate a site-specific recombination event during which the intron is likely to be inserted by a gap repair mechanism. The net result is the elimination of intron-minus alleles and the propagation of intron-plus alleles into the progeny. This genetic phenomenon by which introns can be transmitted to the entire progeny is defined as an intron homing process.
The first homing intron that was discovered is the r1 intron in mitochondrial large subunit rRNA (LSU rRNA) gene of S. cerevisiae. This intron contains an ORF of 235 codons that codes for an endonuclease I-SceI which recognizes a non-symmetric sequence of 18 bp in the vicinity of the intron homing site and generates a 4 bp staggered cut with 3'OH overhangs.
Other homing endonucleases have been recently identified in the cox1 gene of S. cerevisiae mitochondria (I-SceII), in the nuclear LSU rRNA of P. polycephalum (I-PpoI) and in the td (I-TevI) and sun Y (I-TevII) genes of bacteriophage T4.
Although seemingly identical, in nature, to restriction endonucleases, homing endonucleases are different by their larger recognition sequence that extends up to 18 bp in comparison with 4 to 6 for restriction endonucleases. In opposition to restriction endonucleases which demonstrate a higher degree of sequence specificity, homing endonucleases exhibit recognition degeneracy towards their respective target sequence, that is, the cleavage efficiency at sites containing single-base mutations is the same as that at the wild-type site.
As essential components of methods used to construct restriction maps of smaller genomes, the use of restriction enzymes in mapping larger genomes is limited by their high frequency of cleavage.
Although recognition specificity of homing endonucleases appears to be less than that of restriction enzymes, their larger recognition sequence is susceptible to occur at a much lower frequency in large genomes.
Therefore, the homing endonucleases which generates larger DNA fragments will greatly facilitate the analysis (chromosomal mapping) of large genomes.
It would be highly desirable to provide a homing endonuclease which would recognize nucleotide sequences susceptible to occur at a much lesser frequency within a DNA sequence. Such an enzyme would generate larger DNA fragments which would facilitate, for instance, chromosomal mapping.