Restriction endonucleases are invaluable tools in modem molecular biology. These molecular scissors have numerous uses in areas including molecular cloning, restriction mapping, deletion mutagenesis, and others.
Restriction enzymes bind specifically to and cleave double-stranded DNA at specific sites within or adjacent to a particular sequence known as the recognition sequence. These enzymes have been classified into three groups. Because of the properties of the type I and type III enzymes, they have not been widely used in molecular biology applications, and will not be discussed further. Type II enzymes are part of a binary system known as a restriction modification system consisting of a restriction endonuclease that cleaves a specific sequence of nucleotides and a separate DNA modifying enzyme that modifies the same recognition sequence and thereby prevents cleavage by the cognate endonuclease. A total of about 2103 restriction enzymes are known, encompassing 179 different type II specificities (Roberts, et al., Nucl. Acids Res. 20:2167-2180 (1992)). Although there are more than 1200 type H restriction enzymes, many of them are members of groups which recognize the same sequence. Restriction enzymes which recognize the same sequence are said to be isoschizomers.
The vast majority of type II restriction enzymes recognize specific sequences which are four, five, or six nucleotides in length and which display twofold (palindromic) symmetry. A few enzymes recognize longer sequences or degenerate sequences.
The location of cleavage sites within a palindrome differs from enzyme to enzyme. Some enzymes cleave both strands exactly at the axis of symmetry generating fragments of DNA that carry blunt ends, while others cleave each strand at similar sequences on opposite sides of the axis of symmetry, creating fragments of DNA that carry protruding, single-stranded termini.
Restriction endonucleases with shorter recognition sequences cut DNA more frequently than those with longer recognition sequences. For example, assuming a 50% G-C content, a restriction endonuclease with a 4-base recognition sequence will cleave, on average, every 4.sup.4 (256) bases compared to every 4.sup.6 (4096) bases for a restriction endonuclease with a 6-base recognition sequence. Under certain conditions some restriction endonucleases are capable of cleaving sequences which are similar but not identical to their defined recognition sequence. This altered specificity has been termed "star" (*) activity and is observed only under certain nonstandard reaction conditions. The manner in which an enzyme's specificity is altered depends on the particular enzyme and on the conditions employed to induce the star activity. Conditions that contribute to star activity include high glycerol concentration, high ratio of enzyme to DNA, low ionic strength, high pH, the presence of organic solvents, and the substitution of Mg.sup.++ with other divalent cations. The most common types of star activity involve cutting at a recognition sequence having a single base substitution, cutting at sites having truncation of the outer bases of the recognition sequence, and single-strand nicking. The following restriction endonucleases show star activity: Ase I, BamH I, BssH II, BsuR I, CviJ I, EcoR I, EcoR V, Hind III, Hinf I, Kpn I, Pst I, Pvu II, Sal I, Sca I, Taq I, and Xmn I. Star activity is generally viewed as undesirable, and of little intrinsic value.
Of the 179 unique type II restriction endonucleases, 31 have a 4-base recognition sequence, 11 have a 5-base recognition sequence, 127 have a 6-base recognition sequence, and 10 have recognition sequences of greater than 6 bases. In two cases, a restriction endonuclease has a recognition sequence of less than 4 bases.
The restriction enzyme CviJ I has a three base recognition sequence or a two-base recognition sequence, depending on the reaction conditions. Under normal reaction conditions CviJ I recognizes the sequence PuGCPy (wherein Pu=purine and Py=pyrimidine) and cleaves between the G and C to leave blunt ends (Xia et al., 1987. Nucleic Acids Res. 15:6075-6090). Under "relaxed" or "star" conditions (in the presence of 1 mM ATP and 20 mM DTT) the specificity of CviJ I may be altered to cleave DNA more frequently. This activity is referred to as CviJ I*, for star or altered specificity. However, CviJ I* activity is not observed under conditions which favor star activity of other restriction endonucleases.
The restriction enzyme BsuR I normally recognizes the sequence GGCC and cleaves between the G and C to leave blunt ends. (Heininger, et al., Gene 1:291-303 (1977)). Under relaxed conditions (high pH, low ionic strength, and high glycerol concentration) the specificity of Bsu RI may be altered to cleave DNA more frequently. An isoschizomer of this enzyme, Hae III, does not display this star activity.
Among the most important techniques in molecular biology are the techniques which permit the labeling of DNA or RNA with radioactive or non-radioactive labels. The most commonly used methods of labeling double-stranded DNA are the nick translation method, (Maniatis, et al., Molecular Cloning: A Laboratory Manual, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y. (1982)), and the random primer labeling (RPL) method (Feinberg, et at., Anal. Biochem. 132:6-13 (1984); Feinberg, et al., Anal. Biochem. 137:266-267 (1984)).
The nick translation method involves nicking of template DNA under carefully controlled conditions using DNAse I. DNA polymerase I is then added to the nicked DNA to facilitate the addition of nucleotides at the 3' end and removal of nucleotides at the 5' end of a nick. This process replaces pre-existing nucleotides with labeled nucleotides. The main disadvantage of this labeling system is the sensitive balance required between the concentrations of the nicking enzyme DNAse I and the synthesis enzyme DNA polymerase I; too little or too much of either enzyme significantly reduces the efficiency of the incorporation.
In the RPL method, synthetic oligonucleotide primers six to nine bases long (synthesized in all possible base combinations) are hybridized to denatured DNA. The hybridized primers serve to prime DNA synthesis by either the Klenow fragment of DNA polymerase I, T7 DNA polymerase, or other suitable DNA polymerases. Although typically yielding probes of relatively high specific activity, there are several disadvantages associated with RPL: the primers synthesized are random in sequence and are not specific for the template, hence large quantities of primer are needed for adequate template hybridization; the primers are 6 to 9 nucleotides long, which limits the temperature at which synthesis can occur and therefore the choice of the enzymes that may be used; and most RPL protocols use the Klenow fragment of DNA polymerase I, which is not a highly processive enzyme and therefore requires long incubation times in order to achieve maximum incorporation.
RPL typically yields probes having higher specific activity than probes produced by nick translation, and, thus, RPL has become a preferred method for labeling DNA. For example, the nick translation method routinely yields probes having specific activities of about 10.sup.8 cpm/.mu.g DNA while the RPL routinely yields specific activities of about 10.sup.9 cpm/.mu.g DNA.
Oligonucleotides are essential tools in many molecular biology applications, including sequencing, labeling and hybridization for detection, polymerase chain reaction (PCR) and other forms of nucleic acid amplification, mutagenesis, nucleic acid capture and enrichment, and cloning. The development of methods for controlling the chemical synthesis of oligonucleotides 2-200 bases in length has accelerated the evolution of modem molecular genetics.
The use of synthetic oligonucleotides for labeling and detection is an important tool in research and clinical labs. Conventional methods for labeling synthetic oligonucleotides generally employ one oligonucleotide containing one or a few labels. There are several methods for labeling oligonucleotides at the 5' or 3' ends using .sup.32 P-dNTP (dNTP=deoxynucleoside triphosphate), biotin-11-dUTP, fluorescein-dUTP, DNP-dNTP (DNP=dinitrophenol), digioxigenin-dUTP etc. as labels. One method, 5' end labeling, is achieved by a forward or exchange reaction using polynucleotide kinase. In the forward reaction .gamma..sup.32 P from [.gamma..sup.32 P]-ATP is added to a dephosphorylated 5' end of the oligonucleotide and in the exchange reaction an excess of ADP is used to cause an exchange of the terminal 5' phosphate from DNA to ADP followed by transfer of the .gamma..sup.32 P from .gamma..sup.32 P-ATP to the 5' end of the DNA. Homopolymeric tailing is another method for labeling oligonucleotides and involves addition of polynucleotides at the 3' end of the oligonucleotide using labeled nucleotides in the presence of a divalent cation and terminal deoxynucleotidyl transferase. The use and disposal of hazardous radioisotopes for all three methods is a significant disadvantage in research and clinical settings. The use of non-radioactive labels is a safer alternative to isotopes, and in general the level of detection is sensitive enough for some applications. However, there are numerous applications which are limited by the detection sensitivity of singly-labeled oligonucleotides.
The polymerase chain reaction is an exponential DNA amplification procedure based on repeated cycles of denaturation, oligonucleotide primer annealing, and primer extension by a thermostable DNA polymerase, such as the enzyme isolated from Thermus aquaticus (Saiki et al., Science 230:1350-1354 (1985)). The nucleotide sequence of the ends of the DNA must be known in order to synthesize the two oligonucleotides required for this amplification method. PCR has also been used to generate homogeneously-labeled probes using modified deoxynucleotide triphosphates such as digoxigenin-11-dUTP or biotin-11-dUTP (Lion et al., Anal. Biochem. 188:355-337 (1990); Lo et al., Nucleic Acids Res. 16:8719 (1988)).
Epitope mapping is another important technique in molecular biology. Epitope mapping is the precise identification of an epitope associated with a function or structure within a protein. Hence, a binding domain of a protein may be determined using an array of approaches.
One method of epitope mapping involves the digestion of a pure protein into smaller fragments using specific proteases for different time periods, separation of the fragments on SDS-PAGE (ordering the fragments), transfer onto membrane, binding to antibodies or radioactive ligands, and isolation of the smallest peptide either by affinity chromatography or extraction from gels or membrane for peptide sequencing (Glenney et al., J. Mol. Biol. 167:275-293 (1983)).
Another epitope mapping method involves cloning cDNA encoding the protein of interest into an expression vector. The cloned cDNA is truncated using a restriction endonuclease or Bal 31 nuclease for subsequent expression in an appropriate vector. A truncated protein may then be expressed in vitro by a cellular transcription and translation system followed by immunoprecipitation with an antibody or ligand to identify the smallest protein which binds to it. By identification of a segment of the cDNA corresponding to the expression of that protein, a clone is isolated and sequenced to yield information as to the epitope of interest (Lorenzo et al., Eur. J. Biochem. 176:53-60 (1988)).
Site-directed mutagenesis may also be used in epitope mapping. In this method, oligonucleotides are utilized to generate site specific alterations in cDNA encoding a protein of interest, and the mutant cDNA is introduced into cells which lack the protein. The cells may then express the altered protein which may be assayed for function, e.g., ligand binding (Kashles et al. Proc. Natl. Acad. Sci. U.S.A. 85:9576-9571 (1988)).
Epitope mapping may also be performed by restriction digestion of DNA into multiple fragments followed by insertion into an expression vector for the expression and analysis of the function of the resulting protein. (Kamboj, et al. J. Cell Biol. 107:1835-1843 (1988)).
However, each of these methods has limitations and most of these methods require detection of a loss of function. A superior approach is to test for the presence of a function.
A limitation of the first approach to epitope mapping described above is that the protein must be purified to homogeneity and available in large amounts in order to isolate peptides which may be sequenced. This is a major problem because many functionally important proteins are present in low quantities, and the purification of these proteins to homogeneity requires several steps which may not ensure a desired quantity or purity of the protein. Even if the protein is pure, the peptides must be run on special gels to ensure that the ends of the peptides are not blocked for sequencing. Many labs have spent up to a year purifying such proteins and have failed to obtain a sequence, either due to contaminants or the end-blockage of the peptides.
The second approach involves deletions from the C-terminus followed by subcloning of DNA encoding proteins having these deletions in order to express them. A number of clones are picked and assayed separately for the presence or absence of the epitope. This is followed by identification of the extent of a deletion by comparison to the known sequence. This approach is tedious and requires careful control of Bal 31 digestion of the DNA.
In situations where restriction fragments are used for epitope mapping, each fragment is subcloned. This approach requires numerous manipulations to generate inframe start and stop codons for each fragment. Identification of precise domains may require yet another approach, such as synthesis and subcloning of oligonucleotides or site-directed mutagenesis of a target region.
Site-directed mutagenesis requires prior knowledge of the region to be targeted. This approach involves subcloning and sequencing of several subclones to ensure that the mutation has been introduced, and involves analyses of loss of function.
The cloning and sequencing of DNA is crucial to the understanding of genome organization and to nearly every other endeavor undertaken in molecular biology and molecular genetics. Clone banks of DNA are important to the nucleotide sequence analysis of organisms and their genes. Depending on the circumstances, a library of clones may be enriched for or unbiased against the particular genetic unit under analysis. A variety of biochemical and biophysical strategies have been utilized to construct such libraries (Sambrook et al. Molecular Cloning: A Laboratory Manual, Second edition. Cold Spring Harbor Laboratory Press; Cold Spring Harbor N.Y. (1989)). Most large scale DNA sequencing strategies depend on randomly fragmenting a target molecule into small pieces which may be subcloned into a bacteriophage such as M13 (Messing, Methods in Enzymol 101:20-78 (1983); Baer et al., Nature, 310:207-211 (1984); Bankier et al., Methods in Enzymol 155:51-93 (1987); Edwards et at., Genomics 6:593-608 (1990); Davison, J. DNA Seq. and Mapping 1:389-394 (1991)). These vectors produce template DNA in a single-stranded form, the optimal substrate for enzymatic sequence analysis (Sanger et al., Proc. Natl. Acad. Sci. U.S.A. 74:5463-5467 (1977)). The data obtained from such cloned subfragments are combined and overlapped until approximately 80-95% of both strands are covered; after which gap filling techniques are typically utilized to complete the sequence.
Four methods are generally used to fragment large DNAs into a size suitable for enzymatic sequence analysis: DNAse I treatment (Anderson, Nucl. Acids Res. 9:3015-3027 (1981)); low pressure shearing (Schriefer et al., Nucl. Acids Res. 18:7455 (1990)); sonication (Deininger, Anal Biochem 129:216-233 (1983)), and digestion with restriction enzymes. Sonication, low pressure shearing, and treatment with DNAse I all break DNA randomly and result in a collection of overlapping fragments. In addition, sonication and low pressure shearing tend to shear the middle of the targets, so that a preliminary pre-ligation is necessary to equalize the representation of the DNA ends in the final library. Another drawback to these methods is the inefficiency with which the resultant jagged ends may be ligated, necessitating an enzymatic end-repair step prior to cloning. Sonication, the most commonly used method, requires relatively large amounts of DNA, results in a low transformation efficiency and is technically difficult to automate. DNAse I requires recalibration with new batches and age, is sensitive to trace contaminants, and is somewhat variable in its digestion rate. Although fragmentation with restriction enzymes is attractive due to the relative abundance of sequence specificities available, a complete restriction digest results in non-overlapping fragments and partial digests often exhibit non-uniform restriction rates. Generally, as many as four separate libraries utilizing four different restriction digests must be prepared to supply overlaps between fragments.
The steps involved in constructing a random clone library (shotgun cloning) for DNA sequencing by current methods include: 1) isolating the DNA fragment, 2) ligating the DNA to itself, 3) randomly shearing the material by sonication, 4) repairing the ragged ends with a DNA polymerase or nuclease, 5) size fractionation by preparative agarose gel electrophoresis, 6) extraction with organic chemicals to re-purify the DNA, 7) ligating the product into a bacteriophage cloning vector, usually M13mp18 or 19, and 8) and transforming special strains of competent E. coli, (Bankier et at., Methods in Enzymol. 155:51-93 (1987)). These steps are inherently difficult to automate and require large amounts of DNA, because the sonication and/or fractionation steps result in low cloning efficiencies. In addition, the entire process is lengthy, typically requiring several days for a skilled researcher to complete.