1. Field of the Invention
The present invention relates generally to the fields of molecular biology and biochemistry. Specifically, it concerns means for the construction of DNA libraries facilitating amplifying and analyzing DNA. More specifically, the present invention concerns positional amplification of DNA by nick translation methods.
2. Description of Related Art
A. DNA Preparation Using in Vivo and in Vitro Amplification and Multiplexed Versions thereof
Because the amount of any specific DNA molecule that can be isolated from even a large number of cells is usually very small, the only practical methods to prepare enough DNA molecules for most applications involve amplification of specific DNA molecules in vivo or in vitro. There are basically six general methods important for manipulating DNA for analysis: 1) in vivo cloning of unique fragments of DNA; 2) in vitro amplification of unique fragments of DNA; 3) in vivo cloning of random libraries (mixtures) of DNA fragments; 4) in vitro preparation of random libraries of DNA fragments; 5) in vivo cloning of ordered libraries of DNA; and 6) in vitro preparation of ordered libraries of DNA. The beneficial effect of amplifying mixtures of DNA is that it facilitates analysis of large pieces of DNA (e.g., chromosomes) by creating libraries of molecule that are small enough to be analyzed by existing techniques. For example the largest molecule that can be subjected to DNA sequencing methods is less than 2000 bases long, which is many orders of magnitude shorter than single chromosomes of organisms. Although short molecules can be analyzed, considerable effort is required to assemble the information from the analysis of the short molecules into a description of the larger piece of DNA.
1. In Vivo Cloning of Unique DNA
Unique-sequence source DNA molecules can be amplified by separating them from other molecules (e.g., by electrophoresis), ligating them into an autonomously replicating genetic element (e.g., a bacterial plasmid), transfecting a host cell with the recombinant genetic element, and growing a clone of a single transfected host cell to produce many copies of the genetic element having the insert with the same unique sequence as the source DNA (Sambrook, et al., 1989).
2. In Vitro Amplification of Unique DNA
There are many methods designed to amplify DNA in vitro. Usually these methods are used to prepare unique DNA molecules from a complex mixture, e.g., genomic DNA or an artificial chromosome. Alternatively, a restricted set of molecules can be prepared as a library that represents a subset of sequences in the complex mixture. These amplification methods include PCR, rolling circle amplification, and strand displacement (Walker, et al. 1996a; Walker, et al. 1996b; U.S. Pat. No. 5,648,213; U.S. Pat. No. 6,124,120).
The polymerase chain reaction (PCR) can be used to amplify specific regions of DNA between two known sequences (U.S. Pat. No. 4,683,195, U.S. Pat. No. 4,683,202; Frohman et al., 1995). PCR involves the repetition of a cycle consisting of denaturation of the source (template) DNA, hybridization of two oligonucleotide primers to known sequences flanking the region to the amplified, primer extension using a DNA polymerase to synthesize strands complementary to the DNA region located between the two primer sites. Because the products of one cycle of amplification serve as source DNA for succeeding cycles, the amplification is exponential. PCR can synthesize large numbers of specific molecules quickly and inexpensively.
The major disadvantages of the PCR method to amplify DNA are that 1) information about two flanking sequences must be known in order to specify the sequences of the primers; 2) synthesis of primers is expensive; 3) the level of amplification achieved depends strongly on the primer sequences, source DNA sequence, and the molecular weight of the amplified DNA; and 4) the length of amplified DNA is usually limited to less than 5 kb, although xe2x80x9clong-distancexe2x80x9d PCR (Cheng, 1994) allows molecules as long as 20 kb to be amplified.
xe2x80x9cOne-sided PCRxe2x80x9d techniques are able to amplify unknown DNA adjacent to one known sequence. These techniques can be divided into 4 categories: a) ligation-mediated PCR, facilitated by addition of a universal adaptor sequence to a terminus usually created by digestion with a restriction endonuclease; b) universal primer-mediated PCR, facilitated by a primer extension reaction initiated at arbitrary sites c) terminal transferase-mediated PCR, facilitated by addition of a homonucleotide xe2x80x9ctailxe2x80x9d to the 3xe2x80x2 end of DNA fragments; and d) inverse PCR, facilitated by circularization of the template molecules. These techniques can be used to amplify successive regions along a large DNA template in a process sometimes called xe2x80x9cchromosome walking.xe2x80x9d
Ligation-mediated PCR is practiced in many forms. Rosenthal et al. (1990) outlined the basic process of amplifying an unknown region of DNA immediately adjacent to a known sequence located near the end of a restriction fragment. Reiley et al. (1990) used primers that were not exactly complementary with the adaptors in order to suppress amplification of molecules that did not have a specific priming site. Jones (1993) and Siebert (1995; U.S. Pat. No. 5,565,340.) used long universal primers that formed intrastrand xe2x80x9cpanhandlexe2x80x9d structures that suppressed PCR of molecules having two universal adaptors. Arnold (1994) used xe2x80x9cvectorettexe2x80x9d primers having unpaired central regions to increase the specificity of one-sided PCR. Macrae and Brenner (1994) amplified short inserts from a Fugu genomic clone library using nested primers from a specific sequence and from vector sequences. Lin et al. (1995) ligated an adaptor to restriction fragment ends that had an overhanging 5xe2x80x2 end and employed hot-start PCR with a single universal anchor primer and nested specific-site primers to specifically amplify human sequences. Liao et al. (1997) used two specific site primers and 2 universal adaptors, one of which had a blocked 3xe2x80x2 end to reduce non-specific background, to amplify zebrafish promoters. Devon et al. (1995) used xe2x80x9csplinkerette-vectorettexe2x80x9d adaptors with special secondary structure in order to decrease non-specific amplification of molecules with two universal sequences during ligation-mediated PCR. Padegimas and Reichert (1998) used phosphorothioate-blocked oligonucleotides and exoIII digestion to remove the unligated and partially ligated molecules from the reactions before performing PCR, in order to increase the specificity of amplification of maize sequences. Zhang and Gurr (2000) used ligation-mediated hot-start PCR of restriction fragments using nested primers in order to amplify up to 6 kb of a fungal genome. The large amplicons were subsequently directly sequenced using primer extension.
To increase the specificity of ligation-mediated PCR products, many methods have been used to xe2x80x9cindexxe2x80x9d the amplification process by selection for specific sequences adjacent to one or both termini (e.g., Smith, 1992; Unrau, 1994; Guilfoyle, 1997; U.S. Pat. No. 5,508,169).
One-sided PCR can also be achieved by direct amplification using a combination of unique and non-unique primers. Harrison et al. (1997) performed one-sided PCR using a degenerate oligonucleotide primer that was complementary to an unknown sequence and three nested primers complementary to a known sequence in order to sequence transgenes in mouse cells. U.S. Pat. No. 5,994,058 specifies using a unique PCR primer and a second, partially degenerate PCR primer to achieve one-sided PCR. Weber et al. (1998) used direct PCR of genomic DNA with nested primers from a known sequence and 1-4 primers complementary to frequent restriction sites. This technique does not require restriction digestion and ligation of adaptors to the ends of restriction fragments,
Terminal transferase can also be used in one-sided PCR. Cormack and Somssich (1997) were able to amplify the termini of genomic DNA fragments using a method called RAGE (rapid amplification of genome ends) by a) restricting the genome with one or more restriction enzymes; b) denaturing the restricted DNA; c) providing a 3xe2x80x2 polythymidine tail using terminal transferase; and d) performing two rounds of PCR using nested primers complementary to a known sequence as well as the adaptor. Rudi et al. (1999) used terminal transferase to achieve chromosome walking in bacteria using a method of one-sided PCR that is independent of restriction digestion by a) denaturation of the template DNA; b) linear amplification using a primer complementary to a known sequence; c) addition of a poly C xe2x80x9ctailxe2x80x9d to the 3xe2x80x2 end of the single-stranded products of linear amplification using a reaction catalyzed by terminal transferase; and d) PCR amplification of the products using a second primer within the known sequence and a poly-G primer complementary to the poly-C tail in the unknown region. The products amplified by Rudi (1999) have a very broad size distribution, probably caused by a broad distribution of lengths of the linearly-amplified DNA molecules.
RNA polymerase can also be used to achieve one-sided amplification of DNA. U.S. Pat. No. 6,027,913 shows how one-sided PCR can be combined with transcription with RNA polymerase to amplify and sequence regions of DNA with only one known sequence.
Inverse PCR (Ochman et al., 1988) is another method to amplify DNA based on knowledge of a single DNA sequence. The template for inverse PCR is a circular molecule of DNA created by a complete restriction digestion, which contains a small region of known sequence as well as adjacent regions of unknown sequence. The oligonucleotide primers are oriented such that during PCR they give rise to primer extention products that extend way from the known sequence. This xe2x80x9cinside-outxe2x80x9d PCR results in linear DNA products with known sequences at the termini.
The disadvantages of all xe2x80x9cone-sided PCRxe2x80x9d methods is that a) the length of the products are restricted by the limitation of PCR (normally about 2 kb, but with special reagents up to 50 kb); b) whenever the products are single DNA molecules longer than 1 kb they are too long to directly sequence; c) in ligation-mediated PCR the amplicon lengths are very unpredictable due to random distances between the universal priming site and the specific priming site(s), resulting in some products that are sometimes too short to walk significant distance, some which are preferentially amplified due to small size, and some that are too long to amplify and analyze; and d) in methods that use terminal transferase to add a polynucleotide tail to the end of a primer extention product, there is great heterogeneity in the length of the amplicons due to sequence-dependent differences in the rate of primer extension.
Strand displacement amplification (Walker, et al. 1996a; Walker, et al. 1996b; U.S. Pat. No. 5,648,213; U.S. Pat. No. 6,124,120) is a method to amplify one of more termini of DNA fragments using an isothermal strand displacement reaction. The method is initiated at a nick near the terminus of a double-stranded DNA molecule, usually generated by a restriction enzyme, followed by a polymerization reaction by a DNA polymerase that is able to displace the strand complementary to the template strand. Linear amplification of the complementary strand is achieved by reusing the template multiple times by nicking each product strand as it is synthesized. The products are strands with 5xe2x80x2 ends at a unique site and 3xe2x80x2 ends that are various distances from the 5xe2x80x2 ends. The extent of the strand displacement reaction is not controlled and therefore the lengths of the product strands are not uniform. The polymerase used for strand displacement amplification does not have a 5xe2x80x2 exonuclease activity.
Rolling circle amplification (U.S. Pat. No. 5,648,245) is a method to increase the effectiveness of the strand displacement reaction by using a circular template. The polymerase, which does not have a 5xe2x80x2 exonuclease activity, makes multiple copies of the information on the circular template as it makes multiple continuous cycles around the template. The length of the product is very largexe2x80x94typically too large to be directly sequenced. Additional amplification is achieved if a second strand displacement primer is added to the reaction to used the first strand displacement product as a template.
3. In Vivo Cloning of DNA of Random Libraries
Libraries are collections of small DNA molecules that represent all parts of a larger DNA molecule or collection of DNA molecules (Primrose, 1998; Cantor and Smith, 1999). Libraries can be used for analytical and preparative purposes. Genomic clone libraries are the collection of bacterial clones containing fragments of genomic DNA. cDNA clone libraries are collections of clones derived from the mRNA molecules in a tissue.
Cloning of non-specific DNA is commonly used to separate and amplify DNA for analysis. DNA from an entire genome, one chromosome, a virus, or a bacterial plasmid is fragmented by a suitable method (e.g., hydrodynamic shearing or digestion with restriction enzymes), ligated into a special region of a bacterial plasmid or other cloning vector, transfected into competent cells, amplified as a part of a plasmid or chromosome during proliferation of the cells, and harvested from the cell culture. Critical to the specificity of this technique is the fact that the mixture of cells carrying different DNA inserts can be diluted and aliquoted such that some of the aliquots, whether on a surface or in a volume of solution, contain a single transfected cell containing a unique fragment of DNA. Proliferation of this single cell (in vivo cloning) amplifies this unique fragment of DNA so that it can be analyzed. This xe2x80x9cshotgunxe2x80x9d cloning method is used very frequently, because: 1) it is inexpensive; 2) it produces very pure sequences that are usually faithful copies of the source DNA; 3) it can be used in conjunction with clone screening techniques to create an unlimited amount of specific-sequence DNA; 4) it allows simultaneous amplification of many different sequences; 5) it can be used to amplify DNA as large as 1,000,000 bp long; and 6) the cloned DNA can be directly used for sequencing and other purposes.
a. Multiplex Cloning
Cloning is inexpensive, because many pieces of DNA can be simultaneously transfected into host cells. The general term for this process of mixing a number of different entities (e.g., electronic signals or molecules) is xe2x80x9cmultiplexing,xe2x80x9d and is a common strategy for increasing the number of signals or molecules that can be processed simultaneously and subsequently separated to recover the information about the individual signals or molecules. In the case of conventional cloning the recovery process involves diluting the bacterial culture such that an aliquot contains a single bacterium carrying a single plasmid, allowing the bacterium to multiply to create many copies of the original plasmid, and isolating the cloned DNA for further analysis.
The principle of multiplexing different molecules in the same transfection experiment is critical to the economy of the cloning method. However, after the transfection each clone must be grown separately and the DNA isolated separately for analysis. These steps, especially the DNA isolation step, are costly and time consuming. Several attempts have been made to multiplex steps after cloning, whereby hundreds of clones can be combined during the steps of DNA isolation and analysis and the characteristics of the individual DNA molecules recovered later. In one version of multiplex cloning the DNA fragments are separated into a number of pools (e.g., one hundred pools). Each pool is ligated into a different vector, possessing a nucleic acid tag with a unique sequence, and transfected into the bacteria. One clone from each transfection pool is combined with one clone from each of the other transfection pools in order to create a mixture of bacteria having a mixture of inserted sequences, where each specific inserted sequence is tagged with a unique vector sequence, and therefore can be identified by hybridization to the nucleic acid tag. This mixture of cloned DNA molecules can be subsequently separated and subjected to any enzymatic, chemical, or physical processes for analysis such as treatment with polymerase or size separation by electrophoresis. The information about individual molecules can be recovered by detection of the nucleic acid tag sequences by hybridization, PCR amplification, or DNA sequencing. Church has shown methods and compositions to use multiplex cloning to sequence DNA molecules by pooling clones tagged with different labels during the steps of DNA isolation, sequencing reactions, and electrophoretic separation of denatured DNA strands (U.S. Pat. Nos. 4,942,124 and 5,149,625). The tags are added to the DNA as parts of the vector DNA sequences. The tags used can be detected using oligonucleotides labeled with radioactivity, fluorescent groups, or volatile mass labels (Cantor and Smith, 1999; U.S. Pat. Nos. 4,942,124; 5,149,625; and 5,112,736; Richterich and Church, (1993)). A later patent was directed to a technique whereby the tag sequences are ligated to the DNA fragments before cloning using a universal vector (U.S. Pat. No. 5,714,318). Another patent specifies method whereby the tag sequences added before transfection are amplified using PCR after electrophoretic separation of the denatured DNA (PCT WO 98/15644).
b. Disadvantages
The disadvantage of preparing DNA by amplifying random fragments of DNA is that considerable effort is necessary to assemble the information within the short fragments into a description of the original, source DNA molecule. Nevertheless, amplified short DNA fragments are commonly used for many applications, including sequencing by the technique called xe2x80x9cshotgun sequencing.xe2x80x9d Shotgun sequencing involves sequencing one or both ends of small DNA fragments that have been cloned from randomly-fragmented large pieces of DNA. During the sequencing of many such random fragments of DNA, overlapping sequences are identified from those clones that by chance contain redundant sequence information. As more and more fragments are sequenced more overlaps can be found from contiguous regions (contigs), and the regions that are not represented become smaller and less frequent. However, even after sequencing enough fragments that the average region has been sequenced 5-10 times, there will still be gaps between contigs due to statistical sampling effects and to systematic under-representation of some sequences during cloning or PCR amplification (ref). Thus the disadvantage of sequencing random fragments of DNA is that 1) a 5-10 fold excess of DNA must be isolated, subjected to sequencing reactions, and analyzed before having large contiguous sequenced regions; and 2) there are still numerous gaps in the sequence that must be filled by expensive and time-consuming steps.
4. In Vitro Preparation of DNA as Random Libraries
DNA libraries can be formed in vitro and subjected to various selection steps to recover information about specific sequences. In vitro libraries are rarely used in genomics, because the methods that exist for creating such libraries do not offer advantages over cloned libraries. In particular, the methods used to amplify the in vitro libraries are not able to amplify all the DNA in an unbiased manner, because of the size and sequence dependence of amplification efficiency. PCT WO 00/18960 describes how different methods of DNA amplification can be used to create a library of DNA molecules representing a specific subset of the sequences within the genome for purposes of detecting genetic polymorphisms. xe2x80x9cRandom-prime PCRxe2x80x9d (U.S. Pat. No. 5,043,272; U.S. Pat. No. 5,487,985) xe2x80x9crandom-prime strand displacementxe2x80x9d (U.S. Pat. No. 6,124,120) and xe2x80x9cAFLPxe2x80x9d (U.S. Pat. No. 6,045,994) are three examples of methods to create libraries that represent subsets of complex mixtures of DNA molecules.
Single-molecule PCR can be used to amplify individual randomly-fragmented DNA molecules (Lukyanov et al., 1996). In one method, the source DNA is first fragmented into molecules usually less than 10,000 bp in size, ligated to adaptor oligonucleotides, and extensively diluted and aliquoted into separate fractions such that the fractions often contain only a single molecule. PCR amplification of a fraction containing a single molecule creates a very large number of molecules identical to one of the original fragments. If the molecules are randomly fragmented, the amplified fractions represent DNA from random positions within the source DNA.
WO0015779A2 describes how a specific sequence can be amplified from a library of circular molecules with random genomic inserts using rolling circle amplification.
5. Direct in Vivo Cloning of Ordered Libraries of DNA
Directed cloning is a procedure to clone DNA from different parts of a larger piece of DNA, usually for the purpose of sequencing DNA from a different positions along the source DNA. Methods to clone DNA with xe2x80x9cnested deletionsxe2x80x9d have been used to make xe2x80x9cordered librariesxe2x80x9d of clones that have DNA starting at different regions along a long piece of source DNA. In one version, one end of the source DNA is digested with one or more exonuclease activities to delete part of the sequence (McCombie et al., 1991; U.S. Pat. No. 4,843,003). By controlling the extent of exonuclease digestion, the average amount of the deletion can be controlled. The DNA molecules are subsequently separated based on size and cloned. By cloning molecules with different molecular weights, many copies of identical DNA plasmids are produced that have inserts ending at controlled positions within the source DNA. Transposon insertion (Berg et al. 1994) is also used to clone different regions of source DNA by facilitating priming or cleavage at random positions in the plasmids. The size separation and recloning steps make both of these methods labor intensive and slow. They are generally limited to covering regions less than 10 kb in size and cannot be used directly on genomic DNA but rather cloned DNA molecules. No in vivo methods are known are known to directly create ordered libraries of genomic DNA.
6. Direct In Vitro Preparation of Ordered Libraries of DNA
Ordered libraries have not been frequently created in vitro. Hagiwara (1996) used one-sided PCR to create an ordered library of PCR products that was used to sequence about 14 kb of a cosmid. The cosmids were first digested with multiple restriction enzymes, followed by ligation of vectorette adaptors to the products, PCR amplification of the products using primers complementary to a unique sequence in the cosmid and to the adaptor, size separation of the amplified DNA to establish the order of the restriction sites, and sequencing of the ordered PCR products. Because the non-uniform spacing of the restriction sites, 2 kb of the 16 kb region were not sequenced. This method required substantial effort to produce and order the PCR products for the job of sequencing cloned DNA. No in vitro methods are known to directly create ordered genomic libraries of DNA.
B. DNA Physical Mapping to Assemble Ordered Clones
Because of the great difficulty in direct production of ordered DNA libraries, there is a need to reorganize libraries of randomly cloned DNA molecules into ordered libraries where the clones are arranged according to position in the genome (Primrose, 1998; Cantor and Smith, 1999). Some of the purposes for creating an ordered library are 1) to compare overlapping clones to detect defects (e.g., deletions) in some of the clones; 2) to decide which clones should be used to determine the underlying DNA sequence with the least redundancy in sequencing effort; 3) to localize genetic features within the genome; 4) to access different regions of the genome on the basis of their relationship to the genetic map or proximity to another region; and 5) to compare the structure of the genomes of different individuals and different species. There are four basic methods for creating ordered libraries of clones: 1) hybridization to determine sequence homology among different clones; 2) fluorescent in situ hybridization (FISH); 3) restriction analysis; and 4) STS mapping.
1. Mapping by Hybridization
The first method usually involves hybridization of one clone or other identifiable sequence to all other clones in a library. Those clones that hybridize contain overlapping sequences. This method is useful for locating clones that overlap a common site (e.g., a specific gene) in the genome, but is too laborious to create an ordered library of an entire genome. In addition many organisms have large amounts of repetitive DNA that can give false indications of overlap between two regions. The resolution of the hybridization techniques is only as good as the distance between known sequences of DNA.
2. Mapping by FISH
The FISH method allows a particular sequence or limited set of sequences to be localized along a chromosome by hybridization of a fluorescently-labeled probe with a spread of intact chromosomes, followed by light-microscopic localization of the fluorescence. This technique is also only of use to locate a specific sequence or small number of sequences, rather than to create a physical map of the entire genome or an ordered library representing the entire genome. The resolution of the light microscope limits the resolution of FISH to about 1,000,000 bp. To map a single-copy sequence, the FISH probe usually needs to be about 10,000 long.
3. Mapping by Restriction Digestion
Mapping by restriction digestion is frequently used to determine overlaps between clones, thereby allowing ordered libraries of clones to be constructed. It involves assembly of a number of large clones into a contiguous region (contig) by analyzing the overlaps in the restriction patterns of related clones. This method is insensitive to the presence of repetitive DNA. The products of a complete or partial restriction digestion of every clone are size separated by electrophoresis and the molecular weights of the fragments analyzed by computer to find correlated sequences in different clones. The information from the restriction patterns produced by five or more restriction enzymes is usually adequate to determine not only which clones overlap, but also the extent of overlap and whether some of the clones have deletions, additions, rearrangements, etc. Physical mapping of restriction sites is a very tedious process, because of the very large numbers of clones that have to be evaluated. For example,  greater than 300,000 BAC clones of 100,000 bp length need to be analyzed to map the human genome. Using conventional techniques mapping two restriction sites would require at least 300,000 bacterial cultures and DNA isolations, as well as 600,000 restriction digestions and size separations.
4. Mapping by STS Amplification
Sequence tagged sites are sequences, often from the 3xe2x80x2 untranslated portions of mRNA, that can be uniquely amplified in the genome. High-throughput methods employing sophisticated equipment have been devised to screen for the presence of tens of thousands of STSs in tens of thousands of clones. Two clones overlap to the extent that they share common STSs.
C. DNA Sequencing Reactions
DNA sequencing is the most important analytical tool for understanding the genetic basis of living systems. The process involves determining the positions of each of the four major nucleotide bases, adenine (A), cytosine (C), guanine (G), and thymine (T) along the DNA molecule(s) of an organism. Short sequences of DNA are usually determined by creating a nested set of DNA fragments that begin at a unique site and terminate at a plurality of positions comprised of a specific base. The fragments terminated at each of the four natural nucleic acid bases (A, T, G and C) are then separated according to molecular size in order to determine the positions of each of the four bases relative to the unique site. The pattern of fragment lengths caused by strands that terminate at a specific base is called a xe2x80x9csequencing ladder.xe2x80x9d The interpretation of base positions as the result of one experiment on a DNA molecule is called a xe2x80x9cread.xe2x80x9d There are different methods of creating and separating the nested sets of terminated DNA molecules (Adams et al., 1994; Primrose, 1998; Cantor and Smith, 1999).
1. Maxim-Gilbert Method
The Maxim-Gilbert method involves degrading DNA at a specific base using chemical reagents. The DNA strands terminating at a particular base are denatured and electrophoresed to determine the positions of the particular base. The Maxim-Gilbert method involves dangerous chemicals, and is time- and labor-intensive. It is no longer used for most applications.
2. Sanger Method
The Sanger sequencing method is currently the most popular format for sequencing. It employs single-stranded DNA (ssDNA) created using special viruses like M13 or by denaturing double-stranded DNA (dsDNA). An oligonucleotide sequencing primer is hybridized to a unique site of the ssDNA and a DNA polymerase is used to synthesize a new strand complementary to the original strand using all four deoxyribonucleotide triphosphates (dATP, dCTP, dGTP, and dTTP) and small amounts of one or more dideoxyribonucleotide triphosphates (ddATP, ddCTP, ddGTP, and/or ddTTP), which cause termination of synthesis. The DNA is denatured and electrophoresed into a xe2x80x9cladderxe2x80x9d of bands representing the distance of the termination site from the 5xe2x80x2 end of the primer. If only one ddNTP (e.g., ddGTP) is used only those molecules that end with guanine will be detected in the ladder. By using ddNTPs with four different labels all four ddNTPs can be incorporated in the same polymerization reaction and the molecules ending with each of the four bases can be separately detected after electrophoresis in order to read the base sequence.
Sequencing DNA that is flanked by vector or PCR primer DNA of known sequence, can undergo Sanger termination reactions initiated from one end using a primer complementary to those known sequences. These sequencing primers are inexpensive, because the same primers can be used for DNA cloned into the same vector or PCR amplified using primers with common terminal sequences. Commonly-used electrophoretic techniques for separating the dideoxyribonucleotide-terminated DNA molecules are limited to resolving sequencing ladders shorter than 500-1000 bases. Therefore only the first 500-1000 nucleic acid bases can be xe2x80x9creadxe2x80x9d by this or any other method of sequencing the DNA. Sequencing DNA beyond the first 500-1000 bases requires special techniques.
3. Other Base-Specific Termination Methods
Other termination reactions have been proposed. One group of proposals involves substituting thiolated or boronated base analogs that resist exonuclease activity. After incorporation reactions very similar to Sanger reactions a 3xe2x80x2 to 5xe2x80x2 exonuclease is used to resect the synthesized strand to the point of the last base analog. These methods have no substantial advantage over the Sanger method.
Methods have been proposed to reduce the number of electrophoretic separations required to sequence large amounts of DNA. These include multiplex sequencing of large numbers of different molecules on the same electrophoretic device, by attaching unique tags to different molecules so that they can be separately detected. Commonly, different fluorescent dyes are used to multiplex up to 4 different types of DNA molecules in a single electrophoretic lane or capillary (U.S. Pat. No. 4,942,124). Less commonly, the DNA is tagged with large number of different nucleic acid sequences during cloning or PCR amplification, and detected by hybridization (U.S. Pat. No. 4,942,124) or by mass spectrometry (U.S. Pat. No. 4,942,124).
In principle, the sequence of a short fragment can be read by hybridizing different oligonucleotides with the unknown sequence and deciphering the information to reconstruct the sequence. This xe2x80x9csequencing by hybridizationxe2x80x9d is limited to fragments of DNA  less than 50 bp in length. It is difficult to amplify such short pieces of DNA for sequencing. However, even if sequencing many random 50 bp pieces were possible, assembling the short, sometimes overlapping sequences into the complete sequence of a large piece of DNA would be impossible. The use of sequencing by hybridization is currently limited to resequencing, that is testing the sequence of regions that have already been sequenced.
D. Preparing DNA for Determining Long Sequences
Because it is currently very difficult to separate DNA molecules longer than 1000 bases with single-base resolution, special methods have been devised to sequence DNA regions within larger DNA molecules. The xe2x80x9cprimer walkingxe2x80x9d method initiates the Sanger reaction at sequence-specific sites within long DNA. However, most emphasis is on methods to amplify DNA in such a way that one of the ends originates from a specific position within the long DNA molecule.
1. Primer Walking
Once part of a sequence has been determined (e.g., the terminal 500 bases), a custom sequencing primer can be made that is complementary to the known part of the sequence, and used to prime a Sanger dideoxyribonucleotide termination reaction that extends further into the unknown region of the DNA. This procedure is called xe2x80x9cprimer walking.xe2x80x9d The requirement to synthesize a new oligonucleotide every 400-1000 bp makes this method expensive. The method is slow, because each step is done in series rather than in parallel. In addition, each new primer has a significant failure rate until optimum conditions are determined. Primer walking is primarily used to fill gaps in the sequence that have not been read after shotgun sequencing or to complete the sequencing of small DNA fragments  less than 5,000 bp in length. However, WO 00/60121 addresses this problem using a single synthetic primer for PCR to genome walk to unknown sequences from a known sequence. The 5xe2x80x2-blocked primer anneals to the denatured template and is extended, followed by coupling to the extended product of a 3xe2x80x2-blocked oligonucleotide of known sequence, thereby creating a single stranded molecule having had only a single region of known target DNA sequence. By sequencing an amplified product from the extended product having the coupled 3xe2x80x2-blocked oligonucleotide, the process can be applied reiteratively to elucidate consecutive adjacent unknown sequences.
2. PCR Amplification
PCR can be used to amplify a specific region within a large DNA molecule. Because the PCR primers must be complementary to the DNA flanking the specific region, this method is usually used only to prepare DNA to xe2x80x9cresequencexe2x80x9d a region of DNA.
3. Nested Deletion and Transposon Insertion
As described above, cloning or PCR amplification of long DNA with nested deletions brought about by nuclease cleavage or transposon insertion enables ordered libraries of DNA to be created. When exonuclease is used to progressively digest one end of the DNA there is some control over the position of one end of the molecule. However the exonuclease activity cannot be controlled to give a narrow distribution in molecular weights, so typically the exonuclease-treated DNA is separated by electrophoresis to better select the position of the end of the DNA samples before cloning. Because transposon insertion is nearly random, clones containing inserted elements have to be screened before choosing which clones have the insertion at a specific internal site. The labor-intense steps of clone screening make these methods impractical except for DNA less than about 10 kb long.
4. Junction-Fragment DNA Probes for Preparing Ordered DNA Clones
Collins and Weissman have proposed to use xe2x80x9cjunction-fragment DNA probes and probe clustersxe2x80x9d (U.S. Pat. No. 4,710,465) to fractionate large regions of chromosomes into ordered libraries of clones. That patent proposes to size fractionate genomic DNA fragments after partial restriction digestion, circularize the fragments in each size-fraction to form junctions between sequences separated by different physical distances in the genome, and then clone the junctions in each size fraction. By screening all the clones derived from each size-fraction using a hybridization probe from a known sequence, ordered libraries of clones could be created having sequences located different distances from the known sequence. Although this method was designed to walk along megabase distances along chromosomes, it was never put into practical use because of the necessity to maintain and screen hundreds of thousands of clones from each size fraction. In addition, cross hybridization would be expected to yield a large fraction of false positive clones.
5. Shotgun Cloning
The only practical method for preparing DNA longer than 5-20 kb for sequencing is subcloning the source DNA as random fragments small enough to be sequenced. The large source DNA molecule is fragmented by sonication or hydrodynamic shearing, fractionated to select the optimum fragment size, and then subcloned into a bacterial plasmid or virus genome (Adams et al., 1994; Primrose, 1998; Cantor and Smith, 1999). The individual subclones can be subjected to Sanger or other sequencing reactions in order to determine sequences within the source DNA. If many overlapping subclones are sequenced, the entire sequence for the large source DNA can be determined. The advantages of shotgun cloning over the other techniques are: 1) the fragments are small and uniform in size so that they can be cloned with high efficiency independent of sequence; 2) the fragments can be short enough that both strands can be sequenced using the Sanger reaction; 3) transformation and growth of many clones is rapid and inexpensive; and 4) clones are very stable
E. Genomic Sequencing
Current techniques to sequence genomes (as well as any DNA larger than about 5 kb) depend upon shotgun cloning of small random fragments from the entire DNA. Bacteria and other very small genomes can be directly shotgun cloned and sequenced. This is called xe2x80x9cpure shotgun sequencing.xe2x80x9d Larger genomes are usually first cloned as large pieces and each clone is shotgun sequenced. This is called xe2x80x9cdirected shotgun sequencing.xe2x80x9d
1. Pure Shotgun Sequencing
Genomes up to several millions or billions of base pairs in length can be randomly fragmented and subcloned as small fragments (Adams et al., 1994; Primrose, 1998; Cantor and Smith, 1999). However, in the process of fragmentation all information about the relative positions of the fragment sequences in the native genome is lost. This information can be recovered by sequencing with 5-10-fold redundancy (i.e., the number of bases sequenced in different reactions add up to 5 to 10 times as many bases in the genome) so as to generate sufficiently numerous overlaps between the sequences of different fragments that a computer program can assemble the sequences from the subclones into large contiguous sequences (contigs). However, due to some regions being more difficult to clone than others and due to incomplete statistical sampling, there will still be some regions within the genome that are not sequenced even after highly redundant sequencing. These unknown regions are called xe2x80x9cgaps.xe2x80x9d After assembly of the shotgun sequences into contigs, the sequencing is xe2x80x9cfinishedxe2x80x9d by filling in the gaps. Finishing must be done by additional sequencing of the subclones, by primer walking beginning at the edge of a contig, or by sequencing PCR products made using primers from the edges of adjacent contigs.
There are several disadvantages to the pure shotgun strategy: 1) as the size of the region to be sequenced increases, the effort of assembling a contiguous sequence from shotgun reads increases faster than N lnN, where N is the number of reads; 2) repetitive DNA and sequencing errors can cause ambiguities in sequence assembly; and 3) because subclones from the entire genome are sequenced at the same time and significant redundancy of sequencing is necessary to get contigs of moderate size, about 50% of the sequencing has to be finished before the sequence accuracy and the contig sizes are sufficient to get substantial information about the genome. Focusing the sequencing effort on one region is impossible.
2. Directed Shotgun Sequencing
The directed shotgun strategy, adopted by the Human Genome Project, reduces the difficulty of sequence assembly by limiting the analysis to one large clone at a time. This xe2x80x9cclone-by-clonexe2x80x9d approach requires four steps 1) large-insert cloning, comprised of a) random fragmentation of the genome into segments 100,000-300,000 bp in size, b) cloning of the large segments, and c) isolation, selection and mapping of the clones; 2) random fragmentation and subcloning of each clone as thousands of short subclones; 3) sequencing random subclones and assembly of the overlapping sequences into contiguous regions; and 4) xe2x80x9cfinishingxe2x80x9d the sequence by filling the gaps between contiguous regions and resolving inaccuracies. The positions of the sequences of the large clones within the genome are determined by the mapping steps, and the positions of the sequences of the subclones are determined by redundant sequencing of the subclones and computer assembly of the sequences of individual large clones. Substantial initial investment of resources and time are required for the first two steps before sequencing begins. This inhibits sequencing DNA from different species or individuals. Sequencing random subclones is highly inefficient, because significant gaps exist until the subclones have been sequenced to about 7xc3x97 redundancy. Finishing requires xe2x80x9csmartxe2x80x9d workers and effort equivalent to an additional xcx9c3xc3x97 sequencing redundancy.
The directed shotgun sequencing method is more likely to finish a large genome than is pure shotgun sequencing. For the human genome, for example, the computer effort for directed shotgun sequencing is more than 20 times less than that required for pure shotgun sequencing.
There is an even greater need to simplify the sequencing and finishing steps of genomic sequencing. In principle, this can be done by creating ordered libraries of DNA, giving uniform (rather than random) coverage, which would allow accurate sequencing with only about 3 fold redundancy and eliminate the finishing phase of projects. Current methods to produce ordered libraries are impractical, because they can cover only short regions (xcx9c5,000 bp) and are labor-intensive.
F. Resequencing of DNA
The presence of a known DNA sequence or variation of a known sequence can be detected using a variety of techniques that are more rapid and less expensive than de novo sequencing. These xe2x80x9cresequencingxe2x80x9d techniques are important for health applications, where determination of which allele or alleles are present has prognostic and diagnostic value.
1. Microarray Detection of Specific DNA Sequences
The DNA from an individual human or animal is amplified, usually by PCR, labeled with a detectable tag, and hybridized to spots of DNA with known sequences bound to a surface (Primrose, 1998; Cantor and Smith, 1999). If the individual""s DNA contains sequences that are complementary to those on one or more spots on the DNA array, the tagged molecules are physically detected. If the individual""s amplified DNA is not complementary to the probe DNA in a spot, the tagged molecules are not detected. Microarrays of different design have different sensitivities to the amount of tested DNA and the extact amount of sequence complementarity that is required for a positive result. The advantage of the microarray resequencing technique is that many regions of an individual""s DNA can be simultaneously amplified using multiplex PCR, and the mixture of amplified genetic elements hybridized simultaneously to a microarray having thousands of different probe spots, such that variations at many different sites can be simultaneously detected.
One disadvantage to using PCR to amplify the DNA is that only one genetic element can be amplified in each reaction, unless multiplex PCR is employed, in which case only as many as 10-50 loci can be simultaneously amplified. For certain applications, such as SNP (single nucleotide polymorphism) screening, it would be advantageous to simultaneously amplify 1,000-100,000 elements and detect the amplified sequences simultaneously. A second disadvantage to PCR is that only a limited number of DNA bases can be amplified from each element (usually  less than 2000 bp). Many applications require resequencing entire genes, which can be up to 200,000 bp in length.
2. Other Methods of Resequencing
Other methods such as mass spectrometry, secondary structure conformation polymorphism, ligation amplification, primer extension, and target-dependent cleavage can be used to detect sequence polymorphisms. All these methods either require initial amplification of one or more specific genetic elements by PCR or incorporate other forms of amplification that have the same deficiencies of PCR, because they can amplify only a very limited region of the genome at one time.
WO 00/28084 is directed to isothermal amplification of a target nucleic acid sequence utilizing serial generation of double-stranded DNA engineered to contain terminal nicking sites, nicking at least one of those sites, and extending it by strand displacement with a polymerase that lacks 5xe2x80x2 to 3xe2x80x2 exonuclease activity. The nick is generated by restriction endonuclease digestion of a site formed by hybridization of amplification primers to a target nucleic acid, wherein the site is hemi-modified through polymerization in the presence of modified nucleotides.
WO 99/18241 concerns methods for amplification of nucleic acid sequences of interest utilizing multiple strand displacement amplifications with two sets of multiple primers situated to amplify the sequence of interest. Following hybridization of the primers distally to the sequence of interest, amplification proceeds by replication initiated at each primer and continuing through the nucleic acid sequence of interest. In the course of polymerization from the primers in a continuous isothermal reaction, the intervening primers are displaced. Once the nucleic acid strands elongated from the right set of primers reaches the region of the nucleic acid molecule to which the left set of primers hybridizes, and vice versa, another round of priming and replication occurs, allowing multiple copies of a nested set of the target nucleic acid sequence to be synthesized quickly. In specific embodiments the methods concern amplification of whole genomes or concatenated DNA.
WO 00/60121 regards amplification methods of unknown sequences of interest using PCR genome walking with synthetic primers. Specifically, a sequence which is 3xe2x80x2 to a known sequence is amplified. A 5xe2x80x2 oligonucleotide blocked at its 5xe2x80x2 end is annealed to the known sequence in a denatured sample of DNA and extended by polymerization. The strands of the resulting dsDNA molecule are melted, and a 3xe2x80x2 oligonucleotide blocked at its 3xe2x80x2 end is coupled to the polymerized strand. A primer complementary in sequence to the 3xe2x80x2-blocked oligonucleotide is used to generate a double-stranded template for subsequence cycles of PCR.
WO 00/24929 is directed to linear amplification mediated PCR, whereby an unknown DNA or RNA sequence which is adjacent to a known DNA or RNA region is identified and/or sequenced. The region is first subjected to one or more linear PCR steps using one or more primers, and a ds DNA molecule is generated from the resultant ss DNA of the first step. The ds DNA is digested with restriction enzymes to generate blunt and/or cohesive ends, and an oligonucleotide of known sequence is added to the digested ends, and the ds DNA is then subjected to propagation and detection.
U.S. Pat. No. 6,063,604 is directed to amplification of a target nucleic acid sequence within a single- or double-stranded polynucleotide, wherein the method comprises providing a reaction mixture containing a 5xe2x80x2 primer and a 3xe2x80x2 primer each having a recognition sequence for a restriction endonuclease capable of nicking one strand of a double-stranded hemi-modiifed recognition site. The 5xe2x80x2 primer is first annealed to a single stranded target sequence and extended in the presence of deoxyribonucleoside triphosphates wherein at least one is modified. The resultant ds DNA product having one original target strand and a modified polynucleotide extension product is enzymatically separated, and a second amplification primer anneals to the modified polynucleotide extension product and is extended in the presence of deoxyribonucleoside triphosphates wherein at least one is modified to generate a double-stranded polynucleotide comprising the two resultant modified polynucleotide extension products. The resultant hemi-modified recognition sites are subjected to nicking of one strand, and the 3xe2x80x2 end produced by the nick is extended, preferably with a polymerase which displaces the strand.
U.S. Pat. No. 6,117,634, incorporated by reference herein in its entirety, regards sequencing whereby the nucleic acid molecule to be sequenced is double stranded and undenatured, which is an improvement for sequencing regions having intramolecular and/or intermolecular secondary structure. In one embodiment, the double strand is nicked and is followed by strand replacement. The nick is generated by, for example, restriction digestion wherein only one strand is hydrolyzed, random nicking by an enzyme such as DNAase I, nicking by fl gene product II or homologous enzymes from other filamentous bacteriophage, or chemical nicking of the template directed by triple-helix formation. Alternatively, the nick is generated by adapters having a gap or nick generated by, for example, restriction enzyme digestion. The polymerase preferably has 5xe2x80x2 to 3xe2x80x2 exonuclease activity. However, the resultant polymerized strand is the sequencing substrate, and no further modifications or manipulations to the polymerized strand occur.
Similarly, U.S. Pat. No. 6,197,557 and Makarov et al. (1997) regard methods to prepare a DNA molecule by ligating or hybridizing an adaptor to the end of a template double-stranded DNA molecule, thereby introducing a nick, following with nick translation using a DNA polymerase having 5xe2x80x2 to 3xe2x80x2 exonuclease activity. The reaction proceeds for a specific time and is then terminated. The resultant product may be amplified through linear amplification, such as by primer extension, or alternatively by PCR. However, this reference fails to teach specific modifications or manipulations prior to the amplification of the nick translation-extended strand to facilitate the amplification.
The instant invention seeks to overcome the noted deficiencies in the art by providing methods and compositions for use in positionally amplifying a specific sequence within a polynucleotide molecule. Positional Amplification by Nick Translation (PANT) is designed to amplify internal regions of DNA molecules, including restriction fragments, cloned DNA, and intact chromosomes, as molecules of controllable length. Positional Amplification of sequences near the terminus of a DNA molecule involves three essential steps: 1) a Primer Extension/Nick Translation (PENT) reaction; 2) appending a second primer sequence to the 3xe2x80x2 end of the PENT product, forming a PENT amplifiable strand (PENTAmer); and 3) an amplification reaction using one or both priming sequences. In contrast to PCR, which amplifies DNA between two specific sequences, PANT can amplify DNA between two specific positions, or a specified position relative to a specific sequence. PENTAmers can be created to amplify very large regions of DNA (up to 500,000 bp) as random mixtures (unordered positional libraries) or as molecules sorted according to position (ordered positional libraries). PANT is fast and economical, because PENTAmer preparation can be multiplexed. A single PENTAmer preparation can include very complex mixtures of DNA such as hundreds of large-insert clones, complete genomes, or cDNA libraries. Subsequent PCR amplification of the preparation using a single specific primer can positionally amplify contiguous regions along a specific clone, along a specific genomic region, or along a specific expressed sequence. A schematic diagram of how locus specific amplification of DNA can be achieved using PCR, cloning, and three examples of positional amplification of nick-translate libraries are shown in FIG. 1.
Positional Amplification at large distances from the terminus of a DNA molecule also requires size separation and recombination of the template DNA. This disclosure describes the core technology for preparing PENTAmers, as well as specific implementations that produce PENTAmers suitable for amplifying short templates up to 10 kb long, and xe2x80x9crecombinantxe2x80x9d PENTAmers (formed by recombination between internal and terminal sites on templates) suitable for amplifying large-insert clones such as BACs and up to 500 kb regions of genomic DNA. In both cases the PENTAmers may be prepared in microwell plates, such that successive wells contain PENTAmers from a large number (e.g. 96) successive positions within the template. Novel reagents and methods are disclosed for: 1) efficient initiation of PENT reactions at specific sites using novel oligonucleotides; 2) termination of PENT reactions at controllable distances from initiation; 3) novel nick-processing reactions to append priming sequences to the 3xe2x80x2 ends of PENTAmers; 4) novel recombination reactions; 5) novel ways to separate PENTAmers that are located different distances from a DNA terminus; 6) novel ways to prepare hundreds or thousands of PENTAmers simultaneously by multiplexing; 7) novel ways to make and use libraries of PENTAmers; and 8) novel ways to analyze the sequence information in genomes.
PANT allows the amplification of a specific position within a large clone or genome as a PENTAmer of constant length, between 10 and 5,000 bp. The most important applications of PANT involve: 1) creation of mixtures of PENTAmers covering a large region of DNA between 500 and 500,000 bp (an unordered positional library); 2) creation of ordered mixtures of PENTAmers that cover successive slightly overlapping regions along a large region of DNA between 500 and 500,000 bp (an ordered positional library); and 3) creation of mixtures of PENTAmers that cover multiple small regions of DNA dispersed throughout the genome (a sampled positional library). Unordered libraries can be used for purposes such as creating FISH probes and identifying cDNA clones complementary to specific regions of the genome, as well as shotgun sequencing of cDNA, large-insert clones and genomes. Ordered libraries can be used for directed sequencing of cDNA, large-insert clone and genomes, as well as for comparative genomics. Sampled libraries can be used to sequence or resequence informative sequences spread throughout the genome to identify point variations and rearrangements within one genome, or to identify the presence of specific genomes or genetic elements within a population of genomes. PANT can be commercialized as services (e.g., sequence ready ordered PENTAmers for directed sequencing of BACs in high-throughput sequencing centers), as kits (e.g., kits to allow large and small laboratories to create ordered positional libraries for sequence analysis of specific regions of the human genome), or as diagnostic products (e.g., PENTAmer arrays for hybridization analysis of patients"" blood to determine chromosomal mutations).
The following definitions are provided to assist in understanding the nature of the invention:
Up-stream (terminus-attaching) adaptor molecules: short artificial DNA molecules that are ligated to the ends of DNA fragments. Their design has a minimum of two domains: 1) a domain that facilitates ligation to the ends of template DNA molecules; and 2) a domain that facilitates initiation of a nick-translation reaction. In addition, up-stream adaptors may comprise additional domains that facilitate manipulation of the DNA strand, including, for example, recombination, amplification, detection, affinity capture, and inhibition of self-ligation.
Down-stream (nick-attaching) adaptor molecules: partially double-stranded or completely single-stranded DNA molecules that can be linked to 3xe2x80x2 or 5xe2x80x2 DNA termini at a nick within double-stranded DNA molecule. Their design has a minimum of two domains: 1) a domain that facilitates ligation to the 3xe2x80x2 or 5xe2x80x2 DNA termini within the nick or a domain that facilitates priming of the polymerization reaction which results in the extension of the 3xe2x80x2 terminus near the nick; 2) a domain that facilitates amplification. In addition, down-stream adaptors may comprise additional domains that facilitate manipulation of the DNA strand, including, for example, recombination, amplification, detection, affinity capture, and inhibition of self-ligation.
Internal adaptor molecules: Short artificial DNA molecules that are ligated to the ends of DNA fragments that have been exposed by a second cleavage event, usually restriction endonuclease cleavage of an internal site within the source DNA molecules. Their design has a minimum of two domains: 1) a domain that facilitates ligation to the ends of template DNA molecules, and 2) a domain that facilitates initiation of a nick-translation reaction. In addition, internal adaptors may comprise additional domains that facilitate manipulation of the DNA strand, including, for example, recombination, amplification, detection, affinity capture, and inhibition of self-ligation.
Nick translate molecules: DNA molecules produced by coordinated 5xe2x80x2xe2x86x923xe2x80x2 DNA polymerase activity and 5xe2x80x2xe2x86x923xe2x80x2 exonuclease activity. The two activities can be present within one enzyme molecule (as in the case of Taq DNA polymerase or DNA polymerase I) or two enzymes. The synthesis of nick translate molecules is usually initiated at a nick site within an up-stream adaptor at the ends of a DNA fragment or within a down-stream adaptor within a DNA fragment, or within an internal adaptor.
Adaptor attached nick translate molecules: nick translate molecules with up-stream and down-stream adaptor sequences at the 5xe2x80x2 and 3xe2x80x2 termini. Adaptor attached nick translate molecules are usually created by covalent attachment of the down-stream adaptor to the 3xe2x80x2 end of the nick translate molecule.
Nick translation initiation site: a free 3xe2x80x2OH-containing terminus at a nick or a small gap within an adaptor molecule. Where the nick site is contained within an adaptor, the nick translation initiation site can be: 1) a part of the adaptor before attachment to DNA, 2) created by annealing a priming oligonucleotide to the distal primer binding region of the adaptor before or after the first nick translation reaction, or, 3) created by recombination of two different adaptors.
DNA library: a collection of DNA molecules that represent all or a specified fraction of the sequences within a template DNA. DNA libraries can be formed from whole genome, cDNA, cloned, or PCR amplified templates, whereby the template DNA has been reduced in size, recombined, or otherwise processed to become more useful than the original template
DNA. Individual members of the library, complementary to sequences within the template DNA, can be selected and/or amplified by in vivo cloning or in vitro amplification.
Unordered DNA library: a DNA library with a pooled collection of molecules comprised of sequences complementary to unknown positions within a region of the template DNA.
Ordered DNA library: a DNA library separated into sublibraries comprised of molecules complementary to specified positions within a region of the template DNA.
Sampled DNA library: a DNA library with a pooled collection of molecules comprised of sequences complementary to multiple non-contiguous specific regions of the template DNA.
Nick-translate DNA library: a DNA library comprised of adaptor attached DNA molecules that have been created by one or more nick translation reactions.
Unordered nick-translate DNA library: a pooled collection of all adaptor attached nick-translate molecules that are complementary to random positions within a region of the template DNA.
Sampled nick-translate DNA library: a DNA library with a pooled collection of Adaptor-attached nick-translate molecules that are complementary to multiple non-contiguous specific regions of the template DNA.
Ordered nick-translate DNA library: an adaptor attached nick-translate library separated into sublibraries of molecules that are complementary to specified positions within a region of the template DNA.
Adaptor mediated recombination: a biochemical process that involves transient or stable non-covalent association of two adaptor attached DNA regions followed by covalent stabilization using DNA ligase or DNA polymerase enzymes.
Nick site: a discontinuity in one of the strands within double stranded DNA. A nick site created enzymatically by the nick translation reaction is characterized by a free, phosphorylated 5xe2x80x2 end a 3xe2x80x2 hydroxyl group.
Nick translation: a coupled polymerization/degradation process that is characterized by a coordinated 5xe2x80x2 to 3xe2x80x2 DNA polymerase activity and 5xe2x80x2 to 3xe2x80x2 exonuclease activity. The two activities are usually present within one enzyme molecule (as in the case of Taq DNA polymerase or DNA polymerase I), however nick translation may also be achieved by simultaneous activity multiple enzymes exhibiting polymerase and exonuclease activity.
Partial cleavage: the cleavage by an endonuclease of a controlled fraction of the available sites within a DNA template. The extent of partial cleavage can be controlled by, for example, limiting the reaction time, the amount of enzyme, and/or reaction conditions.
Kernel: a known sequence of DNA that is used to select the amplified region within the template DNA.
The invention is a means of preparing a DNA molecule having an amplifiable region. In a preferred embodiment, DNA is prepared by a method comprising obtaining a DNA sample including DNA molecules and attaching upstream adaptor molecules to 5xe2x80x2 termini of DNA molecules of the sample to provide a nick translation initiation site. The DNA is subjected to nick translation using a DNA polymerase having 5xe2x80x2-3xe2x80x2 exonuclease activity. This reaction produces nick translate molecules. Downstream adaptor molecules are attached to the 3xe2x80x2 termini of the nick translate molecules to produce adaptor attached DNA molecules.
It is contemplated that a variety of starting materials may be employed in the context of the instant invention. Therefore, it is contemplated that the DNA will often need to be prepared prior to adaptor attachment. The 5xe2x80x2 termini of the DNA sample may be produced prior to the attachment of the upstream adaptor molecule. It is contemplated that the termini may be produced by restriction digestion by one or more restriction enzymes, by digestion with a nuclease, by mechanical shearing, or by any other means known by those of skill in the art to modify DNA such that an appropriate adaptor may be attached. Where a DNA molecule is restriction digested, a person of ordinary skill would be aware of a wide variety of restriction enzymes that could be employed in the context of the instant invention. Particularly, a person of ordinary skill would be aware that particular application would necessitate the use of a frequently cutting restriction enzyme while other applications would necessitate the use of an infrequent cutter. It would further be clear to a person of ordinary skill, in the context of the contemplated application what would distinguish a frequent from an infrequent cutter. It is further contemplated that the enzymes used to digest may be manipulated to perform either a partial or full digest. A person of ordinary skill would be aware of specific modifications to reaction conditions that would facilitate a partial digest. By means of example: salt conditions could be modified or time of digest could be shortened. A person of ordinary skill would also be aware of methods of modifying chemical or mechanical cleaving processes to achieve a full or partial digest of a DNA sample.
Following attachment of the adaptors to the nick translate product, it is envisioned that the DNA may be denatured. For the purpose of the instant invention, denatured DNA is DNA in which the hydrogen bonds between base pairs in the double-stranded nucleic acid molecules are disrupted to produce single-stranded polynucleotides. Following denaturation, the DNA may be separated. Separation of the denatured DNA may facilitate the separation of a single stranded nick translation product from the DNA sample template strand.
In a preferred embodiment of the invention, DNA is subjected to nick translation for a specified period of time. As the number of bases polymerized by a given DNA polymerase in a specific time T may be definitively calculated, product length may be extrapolated from reaction time. Consequently, the products of a timed reaction will be of a predictable length.
In a further embodiment, upstream and down stream adaptors include functional sites. It is envisioned that the adaptors are specifically engineered to comprise sites that facilitate the further manipulation of the DNA molecule. In preferred embodiments, the upstream adaptors may be engineered to include at least one of the following: a nick translation initiation site, a primer binding region and/or further sites a person of ordinary skill would envision as useful in the modification of the DNA sample. Downstream adaptor may be similarly constructed to include a primer binding region, a nick translation initiation site and/or further sites a person of ordinary skill would envision as useful in the modification of the DNA sample in the context of the invention.
The invention facilitates the manipulation of a both a homogeneous and heterogeneous DNA sample. It is contemplated that to facilitate the differentiation of alternate DNA species, more than one adaptor construct may be attached to DNA molecules within a DNA sample. In an embodiment of the invention, the upstream adaptor attached to the DNA sample consists of a mixture of more than one upstream adaptor molecule constructs. It is envisioned that the alternate constructs may have different primer binding regions. It is further envisioned that the downstream adaptor may comprise more than one downstream adaptor molecule constructs. These constructs may be also be distinguishable by the inclusion of different primer binding regions.
It is envisioned that following adaptor attachment and nick translation that the modified DNA molecules may be amplified. Following amplification, the amplified DNA may be cloned, sequenced or separated.
In a preferred embodiment of the claimed invention, it is envisioned that the adaptor attached DNA, either prior to or subsequent to amplification may be used in the creation of a DNA library. It is envisioned that the DNA library may be either an unordered or an ordered DNA library.
The ordered DNA library may be created with steps involving DNA recombination or by performing nick translation for a specific period of time. The ordered library may further constitute an ordered genomic library. In a preferred embodiment, an ordered library is subjected to sequence scanning.
In a further embodiment of the invention, Applicant""s envision that amplification of the adaptor attached DNA may be carried out with primers complementary to the upstream adaptor molecule and the downstream adaptor molecule. In an alternate embodiment, the adaptor attached DNA may be amplified with a first primer specific to the upstream adaptor and a second primer specific to an internal sequence of the DNA molecule. In a further embodiment, the adaptor attached DNA may be amplified with a first primer specific to the downstream adaptor molecule and a second primer specific to an internal sequence of the DNA molecule.
It is envisioned that the primers used for amplification of the adaptor attached DNA may be labeled. In an additional embodiment of the invention, use of these labeled probes facilitates the creation of hybridization probes.
In a further embodiment of the claimed invention, the adaptor attached DNA molecules may be subjected to recombination. It is envisioned that the recombination may be carried out by: 1) joining an upstream adaptor molecule attached to a first adaptor attached DNA molecule and a downstream adaptor molecule attached to the same adaptor attached DNA molecule; 2) joining an upstream adaptor molecule attached to a first adaptor attached DNA molecule and an internal adaptor molecule attached at an internal site within the same adaptor attached DNA molecule; 3) joining a downstream adaptor molecule attached to a first adaptor attached DNA molecule and an internal adaptor molecule attached at an internal site within the same adaptor attached DNA molecule; 4) joining an upstream adaptor molecule attached to a first adaptor attached DNA molecule and an internal adaptor molecule attached at an internal site within the same adaptor attached DNA molecule and further joining a downstream adaptor molecule attached to a first adaptor attached DNA molecule and an internal adaptor molecule attached at an internal site within the same adaptor attached DNA molecule; or 5) joining an upstream adaptor molecule attached to a first adaptor attached DNA molecule and a downstream adaptor molecule attached to a second adaptor attached DNA molecule.
In another embodiment, it is envisioned that the sample DNA molecules may be between 0.5 and 500 kb in length. In a preferred embodiment, the DNA sample comprises short template molecules of 1-20 kB. It is further envisioned that the sample DNA is cDNA, genomic DNA, or cloned DNA. The cloned DNA may further be classified as originating from a BAC, a YAC, a cosmid, or a large insert clone.
Once the sample DNA is converted to adaptor attached DNA molecules, it is envisioned that the DNA may be separated. In a preferred embodiment, separation of the adaptor attached DNA is based upon size. Nevertheless, a person of ordinary skill would be aware of a variety of means of separating the DNA constructs of the instant invention.
In a further embodiment of the claimed invention, diagnostic mutation analysis is performed. In a preferred embodiment, diagnostic mutation analysis involves the steps of: preparing a DNA library in accordance with the disclosed methods and then screening the DNA library for single or multiple nucleotide polymorphisms. The disclosed DNA library facilitates the shotgun sequencing of the DNA by sequencing the library using primers specific for known loci to derive the sequence of adjacent unknown regions.
In an additional embodiment of the claimed invention, the adaptor attached DNA is recombined after adaptor attachment, size separated and then amplified. It is further envisioned that the size separated DNA is distributed into the wells of a multi-well plate. In a preferred embodiment, the amplified DNA is subsequently mapped, sequenced, resequenced, and/or cloned into a vector.
In a further embodiment of the claimed invention, the adaptor attached DNA is recombined after adaptor attachment, PCR amplified using locus specific primers and subsequently PCR amplified using one locus specific primer and one adaptor specific primer. This amplified DNA may be subsequently sequenced or cloned into a vector.
In a particular embodiment of the claimed invention, the adaptor attached DNA is recombined after adaptor attachment. In a preferred embodiment, the DNA is amplified after adaptor attachment, hybridized to a microarray and the hybridization patterns subsequently analyzed.
It is further envisioned that the DNA sample to be nick translated is modified. This modification is, for example, methylation. In another embodiment, modification of DNA occurs during the nick translation reaction. In this context, the nucleotides integrated by the reaction are modified. In a preferred embodiment, the modified nucleotides are exonuclease resistant. In this context, it is contemplated that the presence of exonuclease resistant nucleotides facilitates the differentiation or isolation of the nick translate product from the template strand.
It is specifically envisioned that the adaptor attached DNA molecules of the instant invention may be further modified or manipulated after the initial reaction. In a preferred embodiment of the claimed invention, the adaptor attached DNA molecules are modified by initiating a second nick translation reaction at the upstream adaptor with a DNA polymerase having 5xe2x80x2-3xe2x80x2 exonuclease activity. A second downstream adaptor molecules is then attached to the 5xe2x80x2 end of the molecules to produce adaptor attached nick translate molecules.
In a further embodiment, the adaptor attached DNA molecules are denatured to produce single stranded DNA. The denatured DNA is then replicated to form a double stranded product. This product is subjected to nick translation using a DNA polymerase having 5xe2x80x2-3xe2x80x2 exonuclease activity, to produce nick translate molecules. Downstream adaptor molecules are then attached to the nick translation initiation site of the nick translate molecules to produce adaptor attached nick translate molecules.
Modification of the DNA molecules of the instant invention may be to facilitate more efficient manipulation of the nick translate product. It is specifically envisioned that the DNA is modified to facilitate efficient isolation or separation of different DNA molecules. In a preferred embodiment, isolation or purification is facilitated by the attachment to the DNA of an affinity adaptor.
In preferred embodiments of the invention, DNA molecules are subjected to recombination. A person of ordinary skill would recognize that a variety of methods exist to carry out recombination of DNA molecules. In a preferred embodiment, recombination is carried out by attaching the upstream adaptor molecule to both the proximal and distal ends of a DNA molecules to create a circular product. Several alternate means of recombination are specifically contemplated within the scope of the instant invention. In a first embodiment, the adaptor attached, nick translate product is recombined by incubating the product with a linker oligonucleotide to form a nick site. The ends of the product are then ligated with a DNA ligase. While a person of ordinary skill would recognize that a broad range of oligonucleotide sizes and properties would function in the context of this embodiment, it is contemplated in the context of this embodiment that the linker oligonucleotide is between 20-200 bp long and further that the linker oligonucleotide includes a region complementary to the upstream adaptor and a region complementary to the downstream adaptor.
In a second embodiment, recombination is carried out by restricting the DNA molecules of the DNA sample with one or more restriction enzymes. Restriction generally is carried out with a frequent cutter, and in specific embodiments, it is contemplated that the digestion is only a partial digest. Further, each end of the DNA molecule may be created with a different restriction enzyme. Upstream adaptor molecules are then attached at both ends of the restricted DNA molecules and nick translation carried out from both upstream adaptors. Once this is done, the ends of the DNA molecules are recombined. Once recombination has been carried out, the recombined molecules may be separated according to size.
In a third embodiment, recombination is carried out by restricting the DNA molecules of the DNA sample with one or more infrequent cutting restriction enzymes. Upstream adaptor molecules are then attached at ends of the restricted DNA molecules and nick translation is carried out from the upstream adaptors. Following nick translation, the nick translate molecules are partially restricted with a frequent cutter and internal adaptor molecules attached at ends of the restricted DNA molecules. Another nick translation reaction is then carried out from the internal adaptors, with the ends of the DNA molecules subsequently being recombined.
Additional methods for recombination are included within various aspects of the claimed invention. In a preferred embodiment, recombination is carried out in a dilute solution and is characterized as: cleaving the DNA molecules with a first sequence-specific endonuclease, ligating an adaptor to the sequence-specific termini of the DNA molecule, cleaving the DNA molecules with a second sequence-specific endonuclease, incubating the DNA molecules at low concentration with an excess of T4 DNA ligase for 16-36 h and then concentrating the DNA molecules. In an alternate embodiment, recombination is carried out in a dilute solution by methylating the DNA molecules, attaching a first and second adaptor with an activatable region to the ends of the DNA molecules, activating the adaptors by incubation with a restriction endonuclease thereby removing distal portion of the adaptors and creating sticky ends, incubating the DNA molecules at low concentration with an excess of T4 DNA ligase for 16-36 h; and then concentrating the DNA molecules.
In a further embodiment, recombination is carried out in a dilute solution by hybridizing the ends of adaptor attached template molecules in dilute solution, concentrating the molecules and ligating the ends of the molecules. In a still further embodiment, recombination is carried out in a dilute solution by hybridizing the ends of adaptor attached template molecules and subjecting the DNA molecule to a nick-translation reaction to form the covalent intramolecular junction.
Various alternate embodiments and modifications of the basic methods of producing adaptor attached nick translate molecules are specifically contemplated. In one embodiment, a DNA molecule having an amplifiable region is produced by obtaining a DNA sample comprising DNA molecules having regions to be amplified and attaching upstream adaptor molecules to the proximal end of DNA molecules to provide a nick translation initiation site. The DNA molecules are then subjected to a nick translation reaction comprising DNA polymerization and 5xe2x80x2-3xe2x80x2 exonuclease activity, for a specific time T. Downstream adaptor molecules are then attached to the 5xe2x80x2 end of the degraded template strand to produce adaptor attached nick translate molecules. The product of this method may then be amplified, sequenced, cloned or otherwise manipulated. In embodiments in which the DNA sample contains a plurality of alternate DNA molecules, the different DNA molecules may be reacted for different times T.
Once a circular product is achieved through recombination, the existence of a nick translation site facilitates the initiation of a nick translation reaction. The positioning of the nick site on the intramolecular junction facilitates nick translation through the region. Proper placement of the nick site allows nick translation to proceed either through the proximal or distal end of the recombined molecule. Coverage of the molecule can be increased by exposing different internal regions of the nick translate molecules as distal ends. It is further contemplated that the adaptors used in recombination comprise single stranded tails.
Where an adaptor is ligated to a DNA molecule in the context of the instant invention, it is specifically contemplated that the adaptor added to a DNA sample consists of a single adaptor construct or multiple adaptor constructs. Thus, embodiments of the invention comprise a DNA sample with a plurality of upstream adaptors in a single tube and a DNA sample with a plurality of downstream adaptors in a single tube.
The instant invention is of particular use in producing DNA to be sequenced or amplified with specific regions for which the sequence is not known. It is specifically contemplated that the instant invention will facilitate the determination of unknown sequences. In a preferred embodiment of the instant invention, the unknown sequence to be determined will abut a known sequence. In this and other contexts, it is specifically contemplated that the nick translation reaction proceed through a known sequence on the DNA molecule. Further, because the sequence of the region is known, sequencing and PCR primers may be constructed to hybridize to such regions within the context of the invention. In particular embodiments of the instant invention, PCR is carried out using a primer or primers specific for the known sequence and a primer or primers specific for the attached adaptors.
In an alternate embodiment of the basic method, an amplifiable region is prepared by obtaining a DNA sample comprising DNA molecules having regions to be amplified followed by attaching upstream adaptor molecules to the proximal end of the DNA molecules of the sample to provide a nick translation initiation site. The adaptor attached molecules are subjected to a first nick translation comprising DNA polymerization and 5xe2x80x2-3xe2x80x2 exonuclease activity, for a specific time T. A first downstream adaptor is then attached to the 3xe2x80x2 end of the nick translate product to produce adaptor attached nick translate molecules. The adaptor attached molecules are then subjected to a second nick translation initiated from the upstream adaptor for a specific time T and then a second downstream adaptor molecule is attached to the 5xe2x80x2 end of the degraded nick translate product. The product of this method may then be amplified, sequenced, cloned, separated or otherwise manipulated. In embodiments in which the DNA sample contains a plurality of alternate DNA molecules, the different DNA molecules may be reacted for a different time T for either of the nick translation reactions performed.
In a further embodiment of the basic method, an amplifiable region is prepared by obtaining a DNA sample comprising DNA molecules having regions to be amplified followed by attaching upstream adaptor molecules to the proximal end of the DNA molecules of the sample to provide a nick translation initiation site. The adaptor attached molecules are then subjected to a first nick translation comprising DNA polymerization and 5xe2x80x2-3xe2x80x2 exonuclease activity, for a specific time T. A first downstream adaptor molecules is then attached to the 3xe2x80x2 end of the nick translate product and the nick translate product separated from the template molecule. The nick translate product is then replicated by primer extension with the product of this step then subjected to a second nick translation comprising DNA polymerization and 5xe2x80x2-3xe2x80x2 exonuclease activity, for a specific time T. Following this step, a second downstream adaptor molecule is attached to the 3xe2x80x2 end of the product. The product of this method may then be amplified, separated, sequenced, cloned or otherwise manipulated. In embodiments in which the DNA sample contains a plurality of alternate DNA molecules, the different DNA molecules may be reacted for different times T for either of the nick translation reactions performed.
In a still further embodiment of the basic method, an amplifiable region is prepared by obtaining a DNA sample comprising DNA molecules having regions to be amplified followed by attaching an affinity adaptor to the proximal ends of the DNA molecules. The affinity adaptor attached molecules are subjected to partial cleavage and then separated. Upstream adaptor molecules are attached to the ends of the affinity adaptor attached molecules to provide a nick translation initiation site and the molecules are then subjected to nick translation comprising DNA polymerization and 5xe2x80x2-3xe2x80x2 exonuclease. Following this step, downstream adaptor molecules are then attached to the nick translate molecules to produce adaptor attached nick translate molecules. The product of this method may then be amplified, sequenced, separated, cloned or otherwise manipulated. In embodiments in which the DNA sample contains a plurality of alternate DNA molecules, the different DNA molecules may be reacted for different times T for either of the nick translation reactions performed. In an additional embodiment, polymerization may involve the incorporation of modified nucleotides, with specific embodiments making the nick translate molecule exonuclease resistant.
In a further modification of the basic nick translation method, an amplifiable region is prepared by obtaining a DNA sample comprising DNA molecules having regions to be amplified followed by attaching the first end of a recombination adaptor to one end of the DNA molecules and attaching the second end of the recombination adaptor to the opposite end of the DNA molecules. The circularized molecule is then subjected to nick translation involving DNA polymerization and 5xe2x80x2-3xe2x80x2 exonuclease activity. A downstream adaptor molecule is attached to the nick translate molecules to produce adaptor attached nick translate molecules. The product of this method may then be amplified, sequenced, separated, cloned or otherwise manipulated. In embodiments in which the DNA sample contains a plurality of alternate DNA molecules, the different DNA molecules may be reacted for different times T for either of the nick translation reactions performed.
In an additional modification of the basic nick translation method, an amplifiable region is prepared by obtaining a DNA sample comprising DNA molecules having regions to be amplified followed by attaching the first end of a recombination adaptor to the proximal end of said DNA molecules. Following adaptor attachment, the DNA is partially cleaved to produce cleavage products having a plurality of lengths. The second end of the recombination adaptor is then attached to the distal ends produced by the partial cleavage. These molecules are subjected to nick translation comprising DNA polymerization and 5xe2x80x2-3xe2x80x2 exonuclease activity, followed by attaching downstream adaptor molecules to the nick translate molecules to produce adaptor attached nick translate molecules. These molecules may then be separated, for example, by size.
In a still further embodiment based upon the basic nick translation method, a first DNA template is obtained and a first upstream adaptor molecule attached to the template to provide a nick translation initiation site. A second DNA template is obtained and a second upstream adaptor molecule attached to the template to provide a nick translation initiation site. The templates are then mixed and subjected to nick translation initiated from the upstream adaptor for a specific time T. Subsequently, a downstream adaptor molecule is attached to the nick translate molecules to produce adaptor attached nick translate molecules. These molecules may be subsequently amplified and differentiated based upon the use of alternate primers specific for the alternate upstream adaptors.
The methods of the instant application are specifically applicable to the construction of a genomic library. In a preferred embodiment, a genomic library is constructed by obtaining genomic DNA and fragmenting it to a desired size. Upstream adaptor molecules are attached to ends of the fragmented genomic DNA molecules of the sample to provide a nick translation initiation site and the molecules subjected to nick translation comprising DNA polymerization and 5xe2x80x2-3xe2x80x2 exonuclease activity. Following this reaction, downstream adaptor molecules are attached to the nick translate molecules to produce adaptor attached nick translate molecules. These products may be recombined, amplified, sequenced, separated, cloned, inserted into a vector or otherwise manipulated. Separation of the library into sublibraries of molecules of different size is contemplated to create an ordered DNA library. It is further contemplated that samples may be chosen based upon the presence of a known kernel sequence within the molecule. Where such a sequence is present, it is contemplated to be useful for the construction of primers for the amplification of the molecule. Amplification in this context will generally comprise sequences adjacent to the kernel sequence. It is contemplated that recombination may be facilitated through the presence of a 5xe2x80x2 phosphate group on the upstream adaptor or the use of a DNA ligase employing a linking oligonucleotide. This method may be further modified by incubating the linking oligonucleotide with the adaptor attached nick translate molecule to form a nick and then ligating the adaptor attached nick translate molecule with a DNA ligase. In a preferred embodiment, a thermostable ligase will be used. In a further embodiment, the sample will be diluted and performed at a low concentration prior to recombination.
In addition to the basic method set forth above, alternate methods of constructing genomic libraries are specifically contemplated in the context of the instant invention. In a preferred embodiment, the library is constructed by obtaining a genomic DNA and fragmenting it. Upstream adaptor molecules are then attached to the ends of the fragmented genomic DNA molecules of the sample to provide a nick translation initiation site. The sample is then subdivided into a plurality of reaction vessels and subjected to nick translation comprising DNA polymerization and 5xe2x80x2-3xe2x80x2 exonuclease activity, for a specific time T. Following nick translation, downstream adaptor molecules are attached to the nick translate molecules to produce adaptor attached nick translate molecules. These products may be recombined, amplified, sequenced, separated, cloned, inserted into a vector or otherwise manipulated. It is further contemplated that samples may be chosen based upon the presence of a known kernel sequence within the molecule. Where such a sequence is present, it is contemplated to be useful for the construction of primers for the amplification of the molecule. Amplification in this context will generally comprise sequences adjacent to the kernel sequence. Where the molecule is recombined, it is contemplate that it may be carried out by ligating the upstream adaptor to the downstream adaptor. In a further embodiment, these molecules may be recombined employing a DNA ligase and a linking oligonucleotide. This method may be further modified by incubating the linking oligonucleotide with the adaptor attached nick; and translate molecule to form a nick and then ligating the adaptor attached nick translate molecule with a DNA ligase. In a preferred embodiment, a thermostable ligase will be used. In a further embodiment, the sample will be diluted and performed at a low concentration prior to recombination. Because this method may be run in alternate reaction vessels, it is contemplated that various times T of reaction may be applied to the different reaction vessels.
DNA libraries produced in the context of the instant invention may be ordered or unordered. In a preferred embodiment, an unordered DNA library is produced by obtaining a DNA sample comprising DNA molecules, cleaving the DNA molecules and attaching adaptors to termini of the cleaved DNA molecules. The molecules are then subjected to nick translation comprising DNA polymerization and 5xe2x80x2-3xe2x80x2 exonuclease activity, to produce nick translate molecules wherein the nick translation is initiated from both ends of the cleaved DNA molecules. The ends of this product are then recombined. These products may be amplified, sequenced, separated, cloned, inserted into a vector or otherwise manipulated. It is further contemplated that samples may be chosen based upon the presence of a known kernel sequence within the molecule. Where such a sequence is present, it is contemplated to be useful for the construction of primers for the amplification of the molecule. Amplification in this context will generally comprise sequences adjacent to the kernel sequence.
In a further embodiment, an ordered DNA library is produced by obtaining a DNA sample comprising DNA molecules, cleaving the DNA molecules and attaching adaptors to termini of the cleaved DNA molecules. The cleaved molecules are then partially cleaved and adaptors attached to the termini of the DNA molecules. These DNA molecules are subjected to nick translation comprising DNA polymerization and 5xe2x80x2-3xe2x80x2 exonuclease activity, to produce nick translate molecules wherein said nick translation is initiated from both ends of the DNA molecules. These products may be recombined, amplified, sequenced, separated, cloned, inserted into a vector or otherwise manipulated. It is further contemplated that samples may be chosen based upon the presence of a known kernel sequence within the molecule. Where such a sequence is present, it is contemplated to be useful for the construction of primers for the amplification of the molecule. Amplification in this context will generally comprise sequences adjacent to the kernel sequence. In a further embodiment, nucleotide analogs are integrated during amplification. In an additional embodiment, the time of primer extension is limited. In the context of recombining the molecules, it is specifically contemplated that the sample will be diluted prior to recombination and that recombination results in a covalent bond. In a preferred embodiment, the sample may be diluted to a point where the sample comprises substantially a single DNA molecule. Where the product is sequenced, sequencing may be carried out by cycle sequencing. Where cycle sequencing is performed it is specifically contemplated that the cycle sequencing employs a primer complementary to an adaptor and at least one or two base pairs adjacent to the adaptor.
In an alternate aspect of the instant invention, the basic methods set forth herein are applied to the construction of a DNA library. In a preferred embodiment, the DNA library is constructed by obtaining a DNA sample comprising DNA molecules and cleaving the DNA molecules with an infrequently-cutting restriction enzyme. Upstream adaptor molecules are then attached to the ends of the cleaved DNA molecules of the sample to provide a nick translation initiation site. The DNA molecules are then subjected to nick translation comprising DNA polymerization and 5xe2x80x2-3xe2x80x2 exonuclease activity and downstream adaptor molecules subsequently attached to the nick translate molecules to produce adaptor attached nick translate molecules. These molecules are then partially cleaved with a frequently cutting restriction enzyme; and upstream adaptor molecules attached to the ends of the adaptor attached nick translate molecules produced by said partial digestion. The DNA molecules are then again subjected to nick translation comprising DNA polymerization and 5xe2x80x2-3xe2x80x2 exonuclease activity and downstream adaptor molecules attached to the nick translate molecules to produce adaptor attached nick translate molecules. These products may be subsequently recombined, amplified or separated. Where the recombined molecule is amplified it is contemplated that a primer specific for an adaptor and or a primer specific for a kernel sequence within the molecule may be used.
In an additional embodiment based upon the basic method, a DNA sample comprising DNA molecules having regions to be amplified is obtained. At least a first upstream adaptor and at least a second upstream adaptor are then attached to the DNA molecules which are then subjected to recombination at low DNA concentrations. The recombined molecules are subjected to nick translation comprising DNA polymerization and 5xe2x80x2-3xe2x80x2 exonuclease activity and downstream adaptor molecules attached to the nick translate molecules to produce adaptor attached nick translate molecules. The products of this reaction may be subsequently amplified, sequenced, separated, cloned or otherwise manipulated.
In an alternate embodiment, the instant invention provides methods for sequencing large DNA molecules. In a preferred embodiment, a BAC clone is sequenced by cleaving the BAC clone at a cos site with lambda terminase and ligating an upstream adaptor to the 5xe2x80x2 overhangs. The DNA is partially cleaved with a frequently cutting enzyme and the ends of the fragments recombined. A nick-translation reaction is performed from both ends of the fragments. A poly-G tail is added to the 3xe2x80x2 end of the recombined nick-translate product with terminal transferase. An adaptor having a poly-C 3xe2x80x2 single-strand overhang and a unique double strand sequence is ligated at the end to the poly-G tail. The strands are then size separated and distributed into the wells of a microplate. The DNA is amplified with primers complementary to adaptor sequences such that products are formed which proceed in either a clockwise or counterclockwise direction around the recombined molecule. The molecules are then ligated into a cloning vector and subsequently sequenced.
It is further contemplated that the reagents necessary to carry out the invention may be combined in a kit. In a preferred embodiment, kits may include DNA for use in the context of the instant invention. Where DNA is included in a kit, it is specifically contemplated that the DNA may be genomic DNA. It is further contemplated that the DNA may be prokaryotic or eukaryotic; from a plant or an animal. Where the DNA is from a plant or animal, a person of ordinary skill would recognize a wide variety of species to which this method would be particularly applicable. Animal DNA of particular relevance may include human, feline, canine, bovine, equine, porcine, caprine, murine, lupine, ranine, piscine and simian. Plant species of interest include both monocots and dicots. Species of particular relevance include species of agricultural relevance, for example, tobacco, tomato, potato, sugar beet, pea, carrot, cauliflower, broccoli, soybean, canola, sunflower, alfalfa, cotton, Arabidopsis, wheat, maize, rye, rice, turfgrass, oat, barley, sorghum, millet, and sugarcane.
A variety of different adaptor constructs are important to the methods of the instant inventions. Upstream adaptors, downstream adaptors and recombination adaptors all have specific functions in various embodiments of the invention. In a preferred embodiment of the invention, an upstream adaptor construct may be characterized as a first domain comprising nucleotides that facilitate ligation of the construct to a nucleic acid and a second domain proximal to the first domain, comprising a site which facilitates the initiation of a nick translation reaction and a site that facilitates recombination. When this adaptor is ligated to a polynucleotide molecule it results in the only free 3xe2x80x2 OH group capable of initiating a nick translation reaction within the second domain of the adaptor.
An alternate upstream adaptor construct useful in the context of the invention is characterized as comprising: a first oligonucleotide comprising a phosphate group at the 5xe2x80x2 end and a blocking nucleotide at the 3xe2x80x2 end; a second oligonucleotide comprising a blocked 3xe2x80x2 end, a non-phosphorylated 5xe2x80x2 end, and a nucleotide sequence complementary to the 5xe2x80x2 element of the first oligonucleotide; and a third oligonucleotide comprising a 3xe2x80x2 hydroxyl group, a non-phosphorylated 5xe2x80x2 end, and a nucleotide sequence complementary to the 3xe2x80x2 element of said first oligonucleotide. The oligonucleotides of this adaptor may be a variety of lengths, nevertheless, in preferred embodiments the first oligonucleotide is from 10 to 200 bases and the second and third oligonucleotide are from 5 to 195 bases. The first oligonucleotide may be further characterized as comprising an additional 3xe2x80x2 tail, a 3xe2x80x2 end protected from exonuclease activity, and/or one or more nuclease resistant nucleotide analogs. The third oligonucleotide may be further characterized as comprising a 3xe2x80x2 end capable of initiating a nick translation reaction.
An additional upstream adaptor construct useful in the context of the invention is characterized as comprising: a first oligonucleotide including a 5xe2x80x2 phosphate and a 3xe2x80x2 nucleotide blocked to prevent ligation or extension by a polymerase; a second oligonucleotide comprising a domain which facilitates ligation to the template strand and a nucleotide sequence complementary to the 5xe2x80x2 element of the first oligonucleotide; a third oligonucleotide comprising an initiation site for nick-translation and a nucleotide sequence complementary to a region of the first oligonucleotide; and a fourth, fifth and sixth oligonucleotide which comprise a nucleotide sequence complementary to a region of said first oligonucleotide and may be readily removed to expose the 3xe2x80x2 terminus of the adaptor. In a particular embodiment of this construct, the removal of the fourth, fifth and sixth oligonucleotides creates a site that facilitates recombination.
Another adaptor construct envisioned to be useful in the context of the instant invention comprises a first domain comprising nucleotides that facilitate ligation of the construct to a nucleic acid, a second domain proximal to the first domain comprising a site which facilitates the initiation of a nick translation reaction, and a third domain proximal to the first domain, comprising a second site which facilitates the initiation of a nick translation reaction. This adaptor may be further characterized as a site that facilitates recombination. When this adaptor is ligated to a polynucleotide molecule, it results in the only free 3xe2x80x2 OH groups capable of initiating a nick translation reaction within said second and said third domains.
The adaptor construct may further comprise a variety of features that would facilitate the manipulation of the attached DNA molecule. The adaptors may be further characterized as including a primer binding site, a nucleotide overhang, a domain that inhibits self ligation, a single ligatable terminus, a single free 3xe2x80x2 OH group capable of initiating a nick translation reaction, one or more nuclease resistant analogs and/or at least one degradable base. Where the adaptor includes a degradable base, it may be used for the creation of a free 3xe2x80x2 OH and may be deoxyribouracil. The site for initiation of a nick translation reaction may be further characterized as a single stranded region in an otherwise essentially double stranded molecule.
An additional adaptor construct is characterized as a first oligonucleotide comprising a phosphate group at the 5xe2x80x2 end and a blocking nucleotide at the 3xe2x80x2 end. A second oligonucleotide comprises a blocked 3xe2x80x2 end, a non-phosphorylated 5xe2x80x2 end, and a nucleotide sequence complementary to the 5xe2x80x2 element of the first oligonucleotide. A third oligonucleotide comprises a 3xe2x80x2 hydroxyl group, a non-phosphorylated 5xe2x80x2 end, and a nucleotide sequence complementary to the 3xe2x80x2 element of the first oligonucleotide. And, a fourth oligonucleotide comprises a 3xe2x80x2 hydroxyl group, a non-phosphorylated 5xe2x80x2 end, and a nucleotide sequence complementary to the 3xe2x80x2 element of said first oligonucleotide. In additional embodiments, the length of the first oligonucleotide is from 10 to 200 bases while the second, third and fourth oligonucleotides may be from 5 to 195 bases. In alternate embodiments, the first oligonucleotide may be further characterized as comprising an additional 3xe2x80x2 tail, a 3xe2x80x2 end protected from exonuclease activity and/or one or more nuclease resistant nucleotide analogs. The third oligonucleotide may be further characterized as comprising a 3xe2x80x2 end capable of initiating a nick translation reaction.
A further adaptor construct is characterized as comprising a first oligonucleotide comprising a 5xe2x80x2 region comprising a 5xe2x80x2 phosphate group and homopolymeric tract of 8-20 bases and a 3xe2x80x2 region comprising a 12-100 base primer binding domain and a second oligonucleotide complementary to the 3xe2x80x2 region of the first oligonucleotide. In an additional embodiment, the adaptor construct may be further characterized as comprising a recombination site.
A further adaptor construct is characterized as comprising a first oligonucleotide of 12-100 bases, wherein the 5xe2x80x2 end of said oligonucleotide comprises a free phosphate group and a second oligonucleotide comprising a homopolymeric tract of 8-20, a 3xe2x80x2 blocking nucleotide and wherein the 5xe2x80x2 region of said second oligonucleotide is complementary to the first oligonucleotide. In an additional embodiment, the adaptor construct may be further characterized as comprising a recombination site.
A further adaptor construct is characterized as comprising a first oligonucleotide comprising a 5xe2x80x2 region comprising a 12-100 base primer binding domain and a 3xe2x80x2 region comprising a homopolymeric tract of 8-20 bases and a second oligonucleotide comprising a blocked 3xe2x80x2 end and a 3xe2x80x2 region complementary to the 5xe2x80x2 region of the first oligonucleotide. In an additional embodiment, the adaptor construct may be further characterized as comprising a recombination site.
A further adaptor construct is characterized as comprising a first oligonucleotide comprising a 5xe2x80x2 region comprising a 12-100 base primer binding domain and a second oligonucleotide comprising a homopolymeric tract of 4-12 bases at the 5xe2x80x2 end, a blocking nucleotide at the 3xe2x80x2 end, and a 3xe2x80x2 region complementary to said first oligonucleotide. In an additional embodiment, the adaptor construct may be further characterized as comprising a recombination site.
In a further embodiment of the instant invention, an amplifiable region may be prepared by obtaining a DNA sample comprising DNA molecules having regions to be amplified and attaching upstream adaptor molecules to the ends of the DNA molecules of the sample to provide a nick translation initiation site. The molecules are then subjected to nick translation comprising DNA polymerization, to produce nick translate molecules. Downstream adaptor molecules are then attached to the nick translate molecules to produce adaptor attached nick translate molecules. These products may be recombined, amplified, sequenced, separated, cloned, inserted into a vector or otherwise manipulated. In a preferred embodiment, the product may be organized as a DNA library.
A preferred embodiment of the instant invention consists of a kit with alternate adaptor constructs combined with components necessary to carry out a nick translation reaction, including, for example, a DNA polymerase and nucleotide triphosphates.
In a preferred embodiment of the instant invention, the adaptor attached nick translate molecules are assembled as a microarray or an ordered microarray and which is capable of being probed for complementary sequences. In a preferred embodiment, the microarray is assembled on a DNA chip. In an embodiment involving the use of a DNA chip, the DNA chip may be used in a variety of applications, for example the analysis of patients"" blood to determine chromosomal mutations or to facilitate diagnostic mutation analysis.