The present invention relates generally to the fields of molecular biology and genomes. Particularly, it concerns utilization of DNA libraries for amplifying and analyzing DNA. More particularly, it concerns utilizing DNA libraries of nick translated products for chromosome walking.
A. DNA Preparation Using in Vivo and in Vitro Amplification and Multiplexed Versions Thereof
Because the amount of any specific DNA molecule that can be isolated from even a large number of cells is usually very small, the only practical methods to prepare enough DNA molecules for most applications involve amplification of specific DNA molecules in vivo or in vitro. There are basically six general methods important for manipulating DNA for analysis: 1) in vivo cloning of unique fragments of DNA, 2) in vitro amplification of unique fragments of DNA, 3) in vivo cloning of random libraries (mixtures) of DNA fragments, 4) in vitro preparation of random libraries of DNA fragments, 5) in vivo cloning of ordered libraries of DNA, 6) in vitro preparation of ordered libraries of DNA. The beneficial effect of amplifying mixtures of DNA is that it facilitates analysis of large pieces of DNA (e.g., chromosomes) by creating libraries of molecule that are small enough to be analyzed by existing techniques. For example the largest molecule that can be subjected to DNA sequencing methods is less than 2000 bases long, which is many orders of magnitude shorter than single chromosomes of organisms. Although short molecules can be analyzed, considerable effort is required to assemble the information from the analysis of the short molecules into a description of the larger piece of DNA.
1. In Vivo Cloning of Unique DNA
Unique-sequence source DNA molecules can be amplified by separating them from other molecules (e.g., by electrophoresis), ligating them into an autonomously replicating genetic element (e.g., a bacterial plasmid), transfecting a host cell with the recombinant genetic element, and growing a clone of a single transfected host cell to product many copies of the genetic element having the insert with the same unique sequence as the source DNA (Sambrook, et al., 1989).
2. In Vitro Amplification of Unique DNA
There are many methods designed to amplify DNA in vitro. Usually these methods are used to prepare unique DNA molecules from a complex mixture, e.g., genomic DNA or a artificial chromosome. Alternatively a restricted set of molecules can be prepared as a library that represents a subset of sequences in the complex mixture. These amplification methods include PCR, rolling circle amplification, and strand displacement (Walker, et al. 1996a; Walker, et al. 1996b; U.S. Pat. No. 5,648,213; U.S. Pat. No. 6,124,120).
The polymerase chain reaction (PCR) can be used to amplify specific regions of DNA between two known sequences (U.S. Pat. No. 4,683,195, U.S. Pat. No. 4,683,202; Frohman et al., 1995). PCR involves the repetition of a cycle consisting of denaturation of the source (template) DNA, hybridization of two oligonucleotide primers to known sequences flanking the region to the amplified, primer extension using a DNA polymerase to synthesize strands complementary to the DNA region located between the two primer sites. Because the products of one cycle of amplification serve as source DNA for succeeding cycles, the amplification is exponential. PCR can synthesize large numbers of specific molecules quickly and inexpensively.
The major disadvantages of the PCR method to amplify DNA are that 1) information about two flanking sequences must be known in order to specify the sequences of the primers, 2) synthesis of primers is expensive, 3) the level of amplification achieved depends strongly on the primer sequences, source DNA sequence, and the molecular weight of the amplified DNA and 4) the length of amplified DNA is usually limited to less than 5 kb, although xe2x80x9clong-distancexe2x80x9d PCR (Cheng, 1994) allows molecules as long as 20 kb to be amplified.
xe2x80x9cOne-sided PCRxe2x80x9d techniques are able to amplify unknown DNA adjacent to one known sequence. These techniques can be divided into 3 categories: a) ligation-mediated PCR, facilitated by addition of a universal adaptor sequence to a terminus usually created by digestion with a restriction endonuclease; b) universal primer-mediated PCR, facilitated by a primer extension reaction initiated at arbitrary sites c) terminal transferase-mediated PCR, facilitated by addition of a homonucleotide xe2x80x9ctailxe2x80x9d to the 3xe2x80x2 end of DNA fragments; and d) xe2x80x9cinverse PCR, facilitated by circularization of the template molecules. These techniques can be used to amplify successive regions along a large DNA template in a process sometimes called xe2x80x9cchromosome walking.xe2x80x9d
Ligation-mediated PCR is practiced in many forms. Rosenthal et al. (1990) outlined the basic process of amplifying an unknown region of DNA immediately adjacent to a known sequence located near the end of a restriction fragment. Reiley et al. (1990) used primers that were not exactly complementary with the adaptors in order to suppress amplification of molecules that did not have a specific priming site. Jones (1993) and Siebert (1995; U.S. Pat. No. 5,565,340) used long universal primers that formed intrastrand xe2x80x9cpanhandlexe2x80x9d structures that suppressed PCR of molecules having two universal adaptors. Arnold (1994) used xe2x80x9cvectorettexe2x80x9d primers having unpaired central regions to increase the specificity of one-sided PCR. Macrae and Brenner (1994) amplified short inserts from a Fugu genomic clone library using nested primers from a specific sequence and from vector sequences. Lin et al. (1995) ligated an adaptor to restriction fragment ends that had an overhanging 5xe2x80x2 end and employed hot-start PCR with a single universal anchor primer and nested specific-site primers to specifically amplify human sequences. Liao et al. (1997) used two specific site primers and 2 universal adaptors, one of which had a blocked 3xe2x80x2 end to reduce non-specific background, to amplify zebrafish promoters. Devon et al. (1995) used xe2x80x9csplinkerette-vectorettexe2x80x9d adaptors with special secondary structure in order to decrease non-specific amplification of molecules with two universal sequences during ligation-mediated PCR. Padegimas and Reichert (1998) used phosphorothioate-blocked oligonucleotides and exo III digestion to remove the unligated and partially ligated molecules from the reactions before performing PCR, in order to increase the specificity of amplification of maize sequences. Zhang and Gurr (2000) used ligation-mediated hot-start PCR of restriction fragments using nested primers in order to amplify up to 6 kb of a fungal genome. The large amplicons were subsequently directly sequenced using primer extension.
To increase the specificity of ligation-mediated PCR products, many methods have been used to xe2x80x9cindexxe2x80x9d the amplification process by selection for specific sequences adjacent to one or both termini (e.g., Smith, 1992; Unrau, 1994; Guilfoyle, 1997; U.S. Pat. No. 5,508,169).
One-sided PCR can also be achieved by direct amplification using a combination of unique and non-unique primers. Harrison et al. (1997) performed one-sided PCR using a degenerate oligonucleotide primer that was complementary to an unknown sequence and three nested primers complementary to a known sequence in order to sequence transgenes in mouse cells. U.S. Pat. No. 5,994,058 specifies using a unique PCR primer and a second, partially degenerate PCR primer to achieve one-sided PCR. Weber et al. (1998) used direct PCR of genomic DNA with nested primers from a known sequence and 1-4 primers complementary to frequent restriction sites. This technique does not require restriction digestion and ligation of adaptors to the ends of restriction fragments,
Terminal transferase can also be used in one-sided PCR. Cormack and Somssich (1997) were able to amplify the termini of genomic DNA fragments using a method called RAGE (rapid amplification of genome ends) by a) restricting the genome with one or more restriction enzymes, b) denaturing the restricted DNA, c) providing a 3xe2x80x2 polythymidine tail using terminal transferase, and d) performing two rounds of PCR using nested primers complementary to a known sequence as well as the adaptor. Rudi et al. (1999) used terminal transferase to achieve chromosome walking in bacteria using a method of one-sided PCR that is independent of restriction digestion by a) denaturation of the template DNA, b) linear amplification using a primer complementary to a known sequence, c) addition of a poly C xe2x80x9ctailxe2x80x9d to the 3xe2x80x2 end of the single-stranded products of linear amplification using a reaction catalyzed by terminal transferase, and d) PCR amplification of the products using a second primer within the known sequence and a poly-G primer complementary to the poly-C tail in the unknown region. The products amplified by Rudi (1999) have a very broad size distribution, probably caused by a broad distribution of lengths of the linearly-amplified DNA molecules.
RNA polymerase can also be used to achieve one-sided amplification of DNA. U.S. Pat. No. 6,027,913 shows how one-sided PCR can be combined with transcription with RNA polymerase to amplify and sequence regions of DNA with only one known sequence.
Inverse PCR (Ochman et al., 1988) is another method to amplify DNA based on knowledge of a single DNA sequence. The template for inverse PCR is a circular molecule of DNA created by a complete restriction digestion, which contains a small region of known sequence as well as adjacent regions of unknown sequence. The oligonucleotide primers are oriented such that during PCR they give rise to primer extension products that extend way from the known sequence. This xe2x80x9cinside-outxe2x80x9d PCR results in linear DNA products with known sequences at the termini.
The disadvantages of all xe2x80x9cone-sided PCRxe2x80x9d methods is that a) the length of the products are restricted by the limitation of PCR (normally about 2 kb, but with special reagents up to 50 kb); b) whenever the products are single DNA molecules longer than 1 kb they are too long to directly sequence; c) in ligation-mediated PCR the amplicon lengths are very unpredictable due to random distances between the universal priming site and the specific priming site(s), resulting in some products that are sometimes too short to walk significant distance, some which are preferentially amplified due to small size, and some that are too long to amplify and analyze, and d) in methods that use terminal transferase to add a polynucleotide tail to the end of a primer extension product, there is great heterogeneity in the length of the amplicons due to sequence-dependent differences in the rate of primer extension.
Strand displacement amplification (Walker, et al. 1996a; Walker, et al. 1996b; U.S. Pat. No. 5,648,213; U.S. Pat. No. 6,124,120) is a method to amplify one of more termini of DNA fragments using an isothermal strand displacement reaction. The method is initiated at a nick near the terminus of a double-stranded DNA molecule, usually generated by a restriction enzyme, followed by a polymerization reaction by a DNA polymerase that is able to displace the strand complementary to the template strand. Linear amplification of the complementary strand is achieved by reusing the template multiple times by nicking each product strand as it is synthesized. The products are strands with 5xe2x80x2 ends at a unique site and 3xe2x80x2 ends that are various distances from the 5xe2x80x2 ends. The extent of the strand displacement reaction is not controlled and therefore the lengths of the product strands are not uniform. The polymerase used for strand displacement amplification does not have a 5xe2x80x2 exonuclease activity.
Rolling circle amplification (U.S. Pat. No. 5,648,245) is a method to increase the effectiveness of the strand displacement reaction by using a circular template. The polymerase, which does not have a 5xe2x80x2 exonclease activity, makes multiple copies of the information on the circular template as it makes multiple continuous cycles around the template. The length of the product is very largexe2x80x94typically too large to be directly sequenced. Additional amplification is achieved if a second strand displacement primer is added to the reaction to used the first strand displacement product as a template.
3. In Vivo Cloning of DNA of Random Libraries
Libraries are collections of small DNA molecules that represent all parts of a larger DNA molecule or collection of DNA molecules (Primrose, 1998; Cantor and Smith, 1999). Libraries can be used for analytical and preparative purposes. Genomic clone libraries are the collection of bacterial clones containing fragments of genomic DNA. cDNA clone libraries are collections of clones derived from the mRNA molecules in a tissue.
Cloning of non-specific DNA is commonly used to separate and amplify DNA for analysis. DNA from an entire genome, one chromosome, a virus, or a bacterial plasmid is fragmented by a suitable method (e.g., hydrodynamic shearing or digestion with restriction enzymes), ligated into a special region of a bacterial plasmid or other cloning vector, transfected into competent cells, amplified as a part of a plasmid or chromosome during proliferation of the cells, and harvested from the cell culture. Critical to the specificity of this technique is the fact that the mixture of cells carrying different DNA inserts can be diluted and aliquoted such that some of the aliquots, whether on a surface or in a volume of solution, contain a single transfected cell containing a unique fragment of DNA. Proliferation of this single cell (in vivo cloning) amplifies this unique fragment of DNA so that it can be analyzed. This xe2x80x9cshotgunxe2x80x9d cloning method is used very frequently, because: 1) it is inexpensive, 2) it produces very pure sequences that are usually faithful copies of the source DNA, 3) it can be used in conjunction with clone screening techniques to create an unlimited amount of specific-sequence DNA, 4) it allows simultaneous amplification of many different sequences, 5) it can be used to amplify DNA as large as 1,000,000 bp long, and 6) the cloned DNA can be directly used for sequencing and other purposes.
a. Multiplex Cloning
Cloning is inexpensive, because many pieces of DNA can be simultaneously transfected into host cells. The general term for this process of mixing a number of different entities (e.g., electronic signals or molecules) is xe2x80x9cmultiplexing,xe2x80x9d and is a common strategy for increasing the number of signals or molecules that can be processed simultaneously and subsequently separated to recover the information about the individual signals or molecules. In the case of conventional cloning the recovery process involves diluting the bacterial culture such that an aliquot contains a single bacterium carrying a single plasmid, allowing the bacterium to multiply to create many copies of the original plasmid, and isolating the cloned DNA for further analysis.
The principle of multiplexing different molecules in the same transfection experiment is critical to the economy of the cloning method. However, after the transfection each clone must be grown separately and the DNA isolated separately for analysis. These steps, especially the DNA isolation step, are costly and time consuming. Several attempts have been made to multiplex steps after cloning, whereby hundreds of clones can be combined during the steps of DNA isolation and analysis and the characteristics of the individual DNA molecules recovered later. In one version of multiplex cloning the DNA fragments are separated into a number of pools (e.g., one hundred pools). Each pool is ligated into a different vector, possessing a nucleic acid tag with a unique sequence, and transfected into the bacteria. One clone from each transfection pool is combined with one clone from each of the other transfection pools in order to create a mixture of bacteria having a mixture of inserted sequences, where each specific inserted sequence is tagged with a unique vector sequence, and therefore can be identified by hybridization to the nucleic acid tag. This mixture of cloned DNA molecules can be subsequently separated and subjected to any enzymatic, chemical, or physical processes for analysis such as treatment with polymerase or size separation by electrophoresis. The information about individual molecules can be recovered by detection of the nucleic acid tag sequences by hybridization, PCR amplification, or DNA sequencing. Church has shown methods and compositions to use multiplex cloning to sequence DNA molecules by pooling clones tagged with different labels during the steps of DNA isolation, sequencing reactions, and electrophoretic separation of denatured DNA strands (U.S. Pat. Nos. 4,942,124; 5,149,625). The tags are added to the DNA as parts of the vector DNA sequences. The tags used can be detected using oligonucleotides labeled with radioactivity, fluorescent groups, or volatile mass labels (Cantor and Smith, 1999; U.S. Pat. Nos. 4,942,124; 5,149,625; 5,112,736; Richterich and Church, 1993). U.S. Pat. No. 5,714,318 is directed to a technique whereby the tag sequences are ligated to the DNA fragments before cloning using a universal vector. Furthermore, PCT WO 98/15644 specifies a method whereby the tag sequences added before transfection are amplified using PCR after electrophoretic separation of the denatured DNA.
b. Disadvantages
The disadvantage of preparing DNA by amplifying random fragments of DNA is that considerable effort is necessary to assemble the information within the short fragments into a description of the original, source DNA molecule. Nevertheless, amplified short DNA fragments are commonly used for many applications, including sequencing by the technique called xe2x80x9cshotgun sequencing.xe2x80x9d Shotgun sequencing involves sequencing one or both ends of small DNA fragments that have been cloned from randomly-fragmented large pieces of DNA. During the sequencing of many such random fragments of DNA, overlapping sequences are identified from those clones that by chance contain redundant sequence information. As more and more fragments are sequenced more overlaps can be found from contiguous regions (contigs). As more and more fragments are sequenced the regions that are not represented become smaller and less frequent. However, even after sequencing enough fragments that the average region has been sequenced 5-10 times, there will still be gaps between contigs due to statistical sampling effects and to systematic under-representation of some sequences during cloning or PCR amplification (ref). Thus the disadvantage of sequencing random fragments of DNA is that 1) a 5-10 fold excess of DNA must be isolated, subjected to sequencing reactions, and analyzed before having large contiguous sequenced regions, and 2) there are still numerous gaps in the sequence that must be filled by expensive and time-consuming steps.
4. In Vitro Preparation of DNA as Random Libraries
DNA libraries can be formed in vitro and subjected to various selection steps to recover information about specific sequences. In vitro libraries are rarely used in genomics, because the methods that exist for creating such libraries do not offer advantages over cloned libraries. In particular the methods used to amplify the in vitro libraries are not able to amplify all of the DNA in an unbiased manner, because of the size and sequence dependence of amplification efficiency. WO 00/18960 describes how different methods of DNA amplification can be used to create a library of DNA molecules representing a specific subset of the sequences within the genome for purposes of detecting genetic polymorphisms. xe2x80x9cRandom-prime PCRxe2x80x9d (U.S. Pat. Nos. 5,043,272; U.S. Pat. No. 5,487,985) xe2x80x9crandom-prime strand displacementxe2x80x9d (U.S. Pat. No. 6,124,120) and xe2x80x9cAFLPxe2x80x9d (U.S. Pat. No. 6,045,994) are three examples of methods to create libraries that represent subsets of complex mixtures of DNA molecules.
Single-molecule PCR can be used to amplify individual randomly-fragmented DNA molecules (Lukyanov et al., 1996). In one method, the source DNA is first fragmented into molecules usually less than 10,000 bp in size, ligated to adaptor oligonucleotides, and extensively diluted and aliquoted into separate fractions such that the fractions often contain only a single molecule. PCR amplification of a fraction containing a single molecule creates a very large number of molecules identical to one of the original fragments. If the molecules are randomly fragmented, the amplified fractions represent DNA from random positions within the source DNA.
WO 00/15779A2 describes how a specific sequence can be amplified from a library of circular molecules with random genomic inserts using rolling circle amplification.
5. In Vivo Cloning of Ordered Libraries of DNA
Directed cloning is a procedure to clone DNA from different parts of a larger piece of DNA, usually for the purpose of sequencing DNA from different positions along the source DNA. Methods to clone DNA with xe2x80x9cnested deletionsxe2x80x9d have been used to make xe2x80x9cordered librariesxe2x80x9d of clones that have DNA starting at different regions along a long piece of source DNA. In one version, one end of the source DNA is digested with one or more exonuclease activities to delete part of the sequence (McCombie et al., 1991; U.S. Pat. No. 4,843,003). By controlling the extent of exonuclease digestion, the average amount of the deletion can be controlled. The DNA molecules are subsequently separated based on size and cloned. By cloning molecules with different molecular weights, many copies of identical DNA plasmids are produced that have inserts ending at controlled positions within the source DNA. Transposon insertion (Berg et al., 1994) is also used to clone different regions of source DNA by facilitating priming or cleavage at random positions in the plasmids, The size separation and recloning steps make both of these methods labor intensive and slow. They are generally limited to covering regions less than 10 kb in size and cannot be used directly on genomic DNA but rather cloned DNA molecules.
6. In Vitro Preparation of Ordered Libraries DNA
Ordered libraries have not been frequently created in vitro. Hagiwara (1996) used vectorette adaptors and exonuclease digestions to create a nested set of one-sided PCR products that could be used to walking across a cosmid after size separation. No methods are known to create ordered libraries of DNA molecules directly from genomic DNA.
B. DNA Physical Mapping to Create Ordered Clones
There is often a need to organize a library of randomly cloned DNA molecules into an ordered library where the clones are arranged according to position in the genome (Primrose, 1998; Cantor and Smith, 1999). Some of the purposes for creating an ordered library are 1) to compare overlapping clones to detect defects (e.g., deletions) in some of the clones, 2) to decide which clones should be used to determine the underlying DNA sequence with the least redundancy in sequencing effort, 3) to localize genetic features within the genome, 4) to access different regions of the genome on the basis of their relationship to the genetic map or proximity to another region, and 5) to compare the structure of the genomes of different individuals and different species. There are four basic methods for creating ordered libraries of clones: 1) hybridization to determine sequence homology among different clones, 2) fluorescent in situ hybridization (FISH), 3) restriction analysis, and 4) STS mapping.
1. Mapping by Hybridization
The first method usually involves hybridization of one clone or other identifiable sequence to all other clones in a library. Those clones that hybridize contain overlapping sequences. This method is useful for locating clones that overlap a common site (e.g., a specific gene) in the genome, but is too laborious to create an ordered library of an entire genome. In addition many organisms have large amounts of repetitive DNA that can give false indications of overlap between two regions. The resolution of the hybridization techniques is only as good as the distance between known sequences of DNA.
2. Mapping by FISH
The FISH method allows a particular sequence or limited set of sequences to be localized along a chromosome by hybridization of a fluorescently-labeled probe with a spread of intact chromosomes, followed by light-microscopic localization of the fluorescence. This technique is also only of use to locate a specific sequence or small number of sequences, rather than to create a physical map of the entire genome or an ordered library representing the entire genome. The resolution of the light microscope limits the resolution of FISH to about 1,000,000 bp. To map a single-copy sequence, the FISH probe usually needs to be about 10,000 long.
3. Mapping by Restriction Digestion
Mapping by restriction digestion is frequently used to determine overlaps between clones, thereby allowing ordered libraries of clones to be constructed. It involves assembly of a number of large clones into a contiguous region (contig) by analyzing the overlaps in the restriction patterns of related clones. This method is insensitive to the presence of repetitive DNA. The products of a complete or partial restriction digestion of every clone are size separated by electrophoresis and the molecular weights of the fragments analyzed by computer to find correlated sequences in different clones. The information from the restriction patterns produced by five or more restriction enzymes is usually adequate to determine not only which clones overlap, but also the extent of overlap and whether some of the clones have deletions, additions, rearrangements, etc. Physical mapping of restriction sites is a very tedious process, because of the very large numbers of clones that have to be evaluated. For example,  greater than 300,000 BAC clones of 100,000 bp length need to be analyzed to map the human genome. Using conventional techniques mapping two restriction sites would require at least 300,000 bacterial cultures and DNA isolations, as well as 600,000 restriction digestions and size separations.
4. Mapping by STS Amplification
Sequence tagged sites are sequences, often from the 3xe2x80x2 untranslated portions of mRNA, that can be uniquely amplified in the genome. High-throughput methods employing sophisticated equipment have been devised to screen for the presence of tens of thousands of STSs in tens of thousands of clones. Two clones overlap to the extent that they share common STSs.
C. DNA Sequencing Reactions
DNA sequencing is the most important analytical tool for understanding the genetic basis of living systems. The process involves determining the positions of each of the four major nucleotide bases, adenine (A), cytosine (C), guanine (G), and thymine (T) along the DNA molecule(s) of an organism. Short sequences of DNA are usually determined by creating a nested set of DNA fragments that begin at a unique site and terminate at a plurality of positions comprised of a specific base. The fragments terminated at each of the four natural nucleic acid bases (A, T, G and C) are then separated according to molecular size in order to determine the positions of each of the four bases relative to the unique site. The pattern of fragment lengths caused by strands that terminate at a specific base is called a xe2x80x9csequencing ladder.xe2x80x9d The interpretation of base positions as the result of one experiment on a DNA molecule is called a xe2x80x9cread.xe2x80x9d There are different methods of creating and separating the nested sets of terminated DNA molecules.
1. Maxim-Gilbert Method
The Maxim-Gilbert method involves degrading DNA at a specific base using chemical reagents. The DNA strands terminating at a particular base are denatured and electrophoresed to determine the positions of the particular base. The Maxim-Gilbert method involves dangerous chemicals, and is time- and labor-intensive. It is no longer used for most applications.
2. Sanger Method
The Sanger sequencing method is currently the most popular format for sequencing. It employs single-stranded DNA (ssDNA) created using special viruses like M13 or by denaturing double-stranded DNA (dsDNA). An oligonucleotide sequencing primer is hybridized to a unique site of the ssDNA and a DNA polymerase is used to synthesize a new strand complementary to the original strand using all four deoxyribonucleotide triphosphates (dATP, dCTP, dGTP, and dTTP) and small amounts of one or more dideoxyribonucleotide triphosphates (ddATP, ddCTP, ddGTP, and/or ddTTP), which cause termination of synthesis. The DNA is denatured and electrophoresed into a xe2x80x9cladderxe2x80x9d of bands representing the distance of the termination site from the 5xe2x80x2 end of the primer. If only one ddNTP (e.g., ddGTP) is used only those molecules that end with guanine will be detected in the ladder. By using ddNTPs with four different labels all four ddNTPs can be incorporated in the same polymerization reaction and the molecules ending with each of the four bases can be separately detected after electrophoresis in order to read the base sequence.
Sequencing DNA that is flanked by vector or PCR primer DNA of known sequence, can undergo Sanger termination reactions initiated from one end using a primer complementary to those known sequences. These sequencing primers are inexpensive, because the same primers can be used for DNA cloned into the same vector or PCR amplified using primers with common terminal sequences. Commonly-used electrophoretic techniques for separating the dideoxyribonucleotide-terminated DNA molecules are limited to resolving sequencing ladders shorter than 500-1000 bases. Therefore only the first 500-1000 nucleic acid bases can be xe2x80x9creadxe2x80x9d by this or any other method of sequencing the DNA. Sequencing DNA beyond the first 500-1000 bases requires special techniques.
3. Other Base-Specific Termination Methods
Other termination reactions have been proposed. One group of proposals involves substituting thiolated or boronated base analogs that resist exonuclease activity. After incorporation reactions very similar to Sanger reactions a 3xe2x80x2 to 5xe2x80x2 exonuclease is used to resect the synthesized strand to the point of the last base analog. These methods have no substantial advantage over the Sanger method.
Methods have been proposed to reduce the number of electrophoretic separations required to sequence large amounts of DNA. These include multiplex sequencing of large numbers of different molecules on the same electrophoretic device, by attaching unique tags to different molecules so that they can be separately detected. Commonly, different fluorescent dyes are used to multiplex up to 4 different types of DNA molecules in a single electrophoretic lane or capillary (U.S. Pat. No. 4,942,124). Less commonly, the DNA is tagged with large number of different nucleic acid sequences during cloning or PCR amplification, and detected by hybridization (U.S. Pat. No. 4,942,124) or by mass spectrometry (U.S. Pat. No. 4,942,124).
In principle, the sequence of a short fragment can be read by hybridizing different oligonucleotides with the unknown sequence, followed by deciphering the information to reconstruct the sequence. This xe2x80x9csequencing by hybridizationxe2x80x9d is limited to fragments of DNA  less than 50 bp in length. It is difficult to amplify such short pieces of DNA for sequencing. However, even if sequencing many random 50 bp pieces were possible, assembling the short, sometimes overlapping sequences into the complete sequence of a large piece of DNA would be impossible. The use of sequencing by hybridization is currently limited to resequencing, that is testing the sequence of regions that have already been sequenced.
D. Preparing DNA for Determining Long Sequences
Because it is currently very difficult to separate DNA molecules longer than 1000 bases with single-base resolution, special methods have been devised to sequence DNA regions within larger DNA molecules. The xe2x80x9cprimer walkingxe2x80x9d method initiates the Sanger reaction at sequence-specific sites within long DNA. However, most emphasis is on methods to amplify DNA in such a way that one of the ends originates from a specific position within the long DNA molecule.
1. Primer Walking
Once part of a sequence has been determined (e.g., the terminal 500 bases), a custom sequencing primer can be made that is complementary to the known part of the sequence, and used to prime a Sanger dideoxyribonucleotide termination reaction that extends further into the unknown region of the DNA. This procedure is called xe2x80x9cprimer walking.xe2x80x9d The requirement to synthesize a new oligonucleotide every 400-1000 bp makes this method expensive. The method is slow, because each step is done in series rather than in parallel. In addition each new primer has a significant failure rate until optimum conditions are determined. Primer walking is primarily used to fill gaps in the sequence that have not been read after shotgun sequencing or to complete the sequencing of small DNA fragments  less than 5,000 bp in length. However, WO 00/60121 addresses using a single synthetic primer for PCR to genome walk to unknown sequences from a known sequence. The 5xe2x80x2-blocked primer anneals to the template and is extended, followed by coupling to the extended product of a 3xe2x80x2-blocked oligonucleotide of known sequence, thereby creating a single stranded molecule having had only a single region of known target DNA sequence. By sequencing an amplified product from the extended product having the coupled 3xe2x80x2-blocked oligonucleotide, the process can be applied reiteratively to elucidate consecutive adjacent unknown sequences.
2. PCR Amplification
PCR can be used to amplify a specific region within a large DNA molecule. Because the PCR primers must be complementary to the DNA flanking the specific region, this method is usually used only to prepare DNA to xe2x80x9cresequencexe2x80x9d a region of DNA.
3. Nested Deletion and Transposon Insertion
As described in above, cloning or PCR amplification of long DNA with nested deletions brought about by nuclease cleavage or transposon insertion enables ordered libraries of DNA to be created. When exonuclease is used to progressively digest one end of the DNA there is some control over the position of one end of the molecule. However the exonuclease activity cannot be controlled to give a narrow distribution in molecular weights, so typically the exonuclease-treated DNA is separated by electrophoresis to better select the position of the end of the DNA samples before cloning. Because transposon insertion is nearly random, clones containing inserted elements have to be screened before choosing which clones have the insertion at a specific internal site. The labor-intense steps of clone screening make these methods impractical except for DNA less than about 10 kb long.
4. Junction-Fragment DNA Probes for Preparing Ordered DNA Clones
Collins and Weissman have proposed to use xe2x80x9cjunction-fragment DNA probes and probe clustersxe2x80x9d (U.S. Pat. No. 4,710,465) to fractionate large regions of chromosomes into ordered libraries of clones. That patent proposes to size fractionate genomic DNA fragments after partial restriction digestion, circularize the fragments in each size-fraction to form junctions between sequences separated by different physical distances in the genome, and then clone the junctions in each size fraction. By screening all the clones derived from each size-fraction using a hybridization probe from a known sequence, ordered libraries of clones could be created having sequences located different distances from the known sequence. Although this method was designed to walk along megabase distances along chromosomes, it was never put into practical use because of the necessity to maintain and screen hundreds of thousands of clones from each size fraction. In addition cross hybridization would be expected to yield a large fraction of false positive clones.
5. Shotgun Cloning
The only practical method for preparing DNA longer than 5 kb for sequencing is subcloning the source DNA as random fragments small enough to be sequenced. The large source DNA molecule is fragmented by sonication or hydrodynamic shearing, fractionated to select the optimum fragment size, and then subcloned into a bacterial plasmid or virus genome. The individual subclones can be subjected to Sanger or other sequencing reactions in order to determine sequences within the source DNA. If many overlapping subclones are sequenced, the entire sequence for the large source DNA can be determined. The advantages of shotgun cloning over the other techniques are: 1) the fragments are small and uniform in size so that they can be cloned with high efficiency independent of sequence; 2) the fragments can be short enough that both strands can be sequenced using the Sanger reaction; 3) transformation and growth of many clones is rapid and inexpensive; and 4) clones are very stable.
E. Genomic Sequencing
Current techniques to sequence genomes (as well as any DNA larger than about 5 kb) depend upon shotgun cloning of small random fragments from the entire DNA. Bacteria and other very small genomes can be directly shotgun cloned and sequenced. This is called xe2x80x9cpure shotgun sequencing.xe2x80x9d Larger genomes are usually first cloned as large pieces and each clone is shotgun sequenced. This is called xe2x80x9cdirected shotgun sequencing.xe2x80x9d
1. Pure Shotgun Sequencing
Genomes up to several millions or billions of base pairs in length can be randomly fragmented and subcloned as small fragments. However in the process of fragmentation all information about the relative positions of the fragment sequences in the native genome is lost. However this information can be recovered by sequencing with 5-10-fold redundancy (i.e., the number of bases sequenced in different reactions add up to 5 to 10 times as many bases in the genome) so as to generate sufficiently numerous overlaps between the sequences of different fragments that a computer program can assemble the sequences from the subclones into large contiguous sequences (contigs). However, due to some regions being more difficult to clone than others and due to incomplete statistical sampling, there will still be some regions within the genome that are not sequenced even after highly redundant sequencing. These unknown regions are called xe2x80x9cgaps.xe2x80x9d After assembly of the shotgun sequences into contigs, the sequencing is xe2x80x9cfinishedxe2x80x9d by filling in the gaps. Finishing must be done by additional sequencing of the subclones, by primer walking beginning at the edge of a contig, or by sequencing PCR products made using primers from the edges of adjacent contigs.
There are several disadvantages to the pure shotgun strategy: 1) As the size of the region to be sequenced increases, the effort of assembling a contiguous sequence from shotgun reads increases faster than N 1nN, where N is the number of reads; 2) Repetitive DNA and sequencing errors can cause ambiguities in sequence assembly; and 3) Because subclones from the entire genome are sequenced at the same time and significant redundancy of sequencing is necessary to get contigs of moderate size, about 50% of the sequencing has to be finished before the sequence accuracy and the contig sizes are sufficient to get substantial information about the genome. Focusing the sequencing effort on one region is impossible.
2. Directed Shotgun Sequencing
The directed shotgun strategy, adopted by the Human Genome Project, reduces the difficulty of sequence assembly by limiting the analysis to one large clone at a time. This xe2x80x9cclone-by-clonexe2x80x9d approach requires four steps: 1) large-insert cloning, comprised of a) random fragmentation of the genome into segments 100,000-300,000 bp in size, b) cloning of the large segments, and c) isolation, selection and mapping of the clones; 2) random fragmentation and subcloning of each clone as thousands of short subclones; 3) sequencing random subclones and assembly of the overlapping sequences into contiguous regions; and 4) xe2x80x9cfinishingxe2x80x9d the sequence by filling the gaps between contiguous regions and resolving inaccuracies. The positions of the sequences of the large clones within the genome are determined by the mapping steps, and the positions of the sequences of the subclones are determined by redundant sequencing of the subclones and computer assembly of the sequences of individual large clones. Substantial initial investment of resources and time are required for the first two steps before sequencing begins. This inhibits sequencing DNA from different species or individuals. Sequencing random subclones is highly inefficient, because significant gaps exist until the subclones have been sequenced to about 7xc3x97 redundancy. Finishing requires xe2x80x9csmartxe2x80x9d workers and effort equivalent to an additional xcx9c3xc3x97 sequencing redundancy.
The directed shotgun sequencing method is more likely to finish a large genome than is pure shotgun sequencing. For the human genome, for example, the computer effort for directed shotgun sequencing is more than 20 times less than that required for pure shotgun sequencing.
There is an even greater need to simplify the sequencing and finishing steps of genomic sequencing. In principle this can be done by creating ordered libraries of DNA, giving uniform (rather than random) coverage, which would allow accurate sequencing with only about 3 fold redundancy and eliminate the finishing phase of projects. Current methods to produce ordered libraries are impractical, because they can cover only short regions (xcx9c5,000 bp) and are labor-intensive.
F. Resequencing of DNA
The presence of a known DNA sequence or variation of a known sequence can be detected using a variety of techniques that are more rapid and less expensive than de novo sequencing. These xe2x80x9cresequencingxe2x80x9d techniques are important for health applications, where determination of which allele or alleles are present has prognostic and diagnostic value.
1. Microarray Detection of Specific DNA Sequences
The DNA from an individual human or animal is amplified, usually by PCR, labeled with a detectable tag, and hybridized to spots of DNA with known sequences bound to a surface. If the individual""s DNA contains sequences that are complementary to those on one or more spots on the DNA array, the tagged molecules are physically detected. If the individual""s amplified DNA is not complementary to the probe DNA in a spot, the tagged molecules are not detected. Microarrays of different design have different sensitivities to the amount of tested DNA and the exact amount of sequence complementarity that is required for a positive result. The advantage of the microarray resequencing technique is that many regions of an individual""s DNA can be simultaneously amplified using multiplex PCR, and the mixture of amplified genetic elements hybridized simultaneously to a microarray having thousands of different probe spots, such that variations at many different sites can be simultaneously detected.
One disadvantage to using PCR to amplify the DNA is that only one genetic element can be amplified in each reaction, unless multiplex PCR is employed, in which case only as many as 50-100 loci can be simultaneously amplified. For certain applications, such as SNP (single nucleotide polymorphism) screening it would be advantageous to simultaneously amplify 1,000-100,000 elements and detect the amplified sequences simultaneously. A second disadvantage to PCR is that only a limited number of DNA bases can be amplified from each element (usually  less than 2000 bp). Many applications require resequencing entire genes, which can be up to 200,000 bp in length.
2. Other Methods of Resequencing
Other methods such as mass spectrometry, secondary structure conformation polymorphism, ligation amplification, primer extension, and target-dependent cleavage can be used to detect sequence polymorphisms. All of these methods either require initial amplification of one or more specific genetic elements by PCR or incorporate other forms of amplification that have the same deficiencies of PCR, because they can amplify only a very limited region of the genome at one time.
A skilled artisan recognizes, based on the teachings provided herein, that deficiencies of existing methods for amplification of unknown DNA adjacent to known sequence can be solved by using nick translate molecule libraries. More particularly, the present invention teaches generating a library of nick translate molecules to amplify and sequence for the purpose of obtaining successive overlapping sequences from a plurality of nick translate molecules.
In an object of the present invention, the primary PENTAmer library, in a specific embodiment, is prepared in vitro from bacterial or human genome using the teachings provided herein.
In another object of the present invention, the primary PENTAmer library generated in vitro from a genome, such as from a bacteria or human, is amplified more than about 1000 times without any significant change in representation of the specific PENTAmer amplicons.
In an additional object of the present invention, a primary PENTAmer library (directly or after amplification), such as from a bacteria or human, is used to amplify a specific PENTAmer or a PENTAmer sub-pool preferably using only one sequence-specific primer, which generates templates that reproducibly produce high quality sequencing data. Typically, the methods described herein allow systematically generating from about 550 to 750 bases of a new sequence located downstream the primer.
In another object of the present invention, a primary eukaryotic (human) PENTAmer library (directly or after amplification) is used to amplify a specific PENTAmer or a PENTAmer sub-pool using two (or more) nested sequence-specific primers.
In an additional object of the present invention, a circularized eukaryotic (human) PENTAmer library is used to amplify a specific PENTAmer or a PENTAmer sub-pool using inverse PCR and two (or more) sequence-specific primers.
The present invention utilizes a library of nick translate molecules as a means to walk along a chromosome. A skilled artisan recognizes that the terms xe2x80x9cwalk,xe2x80x9d xe2x80x9cwalking,xe2x80x9d xe2x80x9cchromosome walking,xe2x80x9d or xe2x80x9cgenome walkingxe2x80x9d are directed to the generation of unknown sequence from a sample nucleic acid, such as a genome, in a sequential manner by starting from a known sequence, in specific embodiments termed herein as a xe2x80x9ckernel,xe2x80x9d sequencing by a first sequencing reaction (called a xe2x80x9creadxe2x80x9d), and generating a second sequencing read from a region of sequence obtained in the first read. Thus, the two reads will overlap to some extent, and a consecutive series of such reactions results in the preferred walking embodiment of the invention.
A skilled artisan is cognizant that any method to make an amplifiable nick translate molecule for chromosome walking is within the scope of the present invention. A skilled artisan also recognizes that, in a preferred method, the amplifiable nick translate molecule is generated by methods comprising at least fragmenting a DNA sample; attaching an adaptor to one end of the fragmented molecules, such as by covalent attachment, wherein the adaptor comprises a nick; nick translating with a DNA polymerase having 5xe2x80x2xe2x86x923xe2x80x2 polymerase activity and 5xe2x80x2xe2x86x923xe2x80x2 exonuclease activity; and attaching a second adaptor to the other end of the nick translated product. The nick translate molecule may be amplified by primer sequences for the adaptors. Although the nick is preferably generated by an adaptor comprising more than one oligonucleotide, wherein the oligonucleotide assembly has a nick between them, a skilled artisan recognizes that the nick may be generated by any standard means in the art.
The following definitions are provided to assist in understanding the nature of the invention.
The term xe2x80x9cnick translate moleculexe2x80x9d as used herein refers to nucleic acid molecules produced by coordinated 5xe2x80x2xe2x86x923xe2x80x2 polymerase activity, such as DNA polymerase, and 5xe2x80x2xe2x86x923xe2x80x2 exonuclease activity. The two activities can be present within on enzyme molecule (such as DNA polymerase I or Taq DNA polymerase). In a preferred embodiment, they have adaptor sequences at their 5xe2x80x2 and 3xe2x80x2 termini.
The term xe2x80x9cnick translationxe2x80x9d as used herein refers to a coupled polymerization/degradation process that is characterized by a coordinated 5xe2x80x2xe2x86x923xe2x80x2 DNA polymerase activity and a 5xe2x80x2xe2x86x923xe2x80x2 exonuclease activity.
The term xe2x80x9cpartial cleavagexe2x80x9d as used herein refers to the cleavage by an endonuclease of a controlled fraction of the available sites within a DNA template. The extent of partial cleavage can be controlled by, for example, limiting the reaction time, the amount of enzyme, and/or reaction conditions.
In an object of the present invention, there is a method of producing a consecutive overlapping series of nucleic acid sequences from a DNA sample, comprising the steps of generating a first amplifiable nick translation product, wherein said nick translation of said first amplifiable nick translation product initiates from a known nucleic acid sequence in the DNA sample; determining at least a partial sequence from said first nick translation product; and generating at least a second amplifiable nick translation product, wherein said nick translation of said second amplifiable nick translation product initiates from the partial sequence of said first nick translation product.
In another object of the present invention there is a method of producing a library of consecutive overlapping series of nucleic acid sequences from a DNA sample comprising DNA molecules having a region comprising a known nucleic acid sequence, the method comprising the steps of digesting DNA molecules of the DNA sample with a first sequence-specific endonuclease to generate a plurality of DNA fragments; generating a first amplifiable nick translation product, wherein said nick translation of said first amplifiable nick translation product initiates from the known nucleic acid sequence; determining at least a partial sequence from said first nick translation product; and generating one or more additional amplifiable nick translation products, wherein said nick translation of said one or more amplifiable nick translation products initiates from the partial sequence of a previous nick translation product. In a specific embodiment, the method further comprises the step of digesting DNA molecules with at least a second sequence-specific endonuclease, wherein the preceding overlapping nick translation product is generated from a DNA fragment from digestion with the first sequence-specific endonuclease or from digestion with the second sequence-specific endonuclease.
In an additional embodiment of the present invention, there is a method of producing a library of consecutive overlapping series of nucleic acid sequences, comprising the steps of obtaining a DNA sample comprising DNA molecules having a region comprising a known nucleic acid sequence; partially cleaving the DNA molecules with a sequence-specific endonuclease to generate a plurality of DNA ends; separating the cleaved DNA molecules; generating a first amplifiable nick translation product, wherein said nick translation of said first amplifiable nick translation product initiates from a known nucleic acid sequence; determining at least a partial sequence from said first nick translation product; and generating one or more amplifiable nick translation products, wherein said nick translation of said one or more amplifiable nick translation products initiates from the partial sequence of a previous nick translation product. In a specific embodiment, the separation of the cleaved DNA molecules is according to size. In another specific embodiment, the size separation is by gel size fractionation. In an additional specific embodiment, the nick translation products are amplified.
In another specific embodiment, the amplification of the nick translation product comprises polymerase chain reaction utilizing a first primer specific to a known sequence in the nick translation product and a second primer specific to an adaptor sequence of the nick translation product. In an additional specific embodiment, at least one of the nick translation products is selectively amplified from the plurality of nick translation products. In a further specific embodiment, the nick translation product is single stranded. In an additional specific embodiment, the partial cleavage of the DNA molecules comprises cleaving for a selected time with a frequently cutting sequence-specific endonuclease, wherein the sequence-specificity of the endonuclease is to three or four nucleotide bases.
In another specific embodiment, the partial cleavage of the DNA molecules comprises subjecting the DNA molecules to a methylase prior to subjection to a methylation-sensitive sequence-specific endonuclease. In a further specific embodiment, the selective amplification comprises introducing to said plurality of nick translation products a plurality of primers, wherein the primers comprise nucleotide base sequence complementary to an adaptor sequence in the nick translation product; an additional variable 3xe2x80x2 terminal nucleotide; and a label; hybridizing the primers to their complementary nucleic acid sequences in the adaptor to form a mixture of primer/nick translate molecule hybrids; and extending from a primer having the 3xe2x80x2 terminal nucleotide complementary to the nucleotide in the nick translate molecule immediately adjacent to the adaptor sequence, wherein the hybridizing and extending steps form a mixture of unextended primer/nick translate molecule hybrids and extended primer molecule/nick translate molecule hybrids.
In a specific embodiment, the method further comprises binding of the mixture by the label to a support; washing the support-bound mixture to remove the nick translate molecules; removing the support-bound extended molecule from the support. In an additional specific embodiment, the primer further comprises two or more variable 3xe2x80x2 terminal nucleotides. In another specific embodiment, the method further comprises separating the nick translate molecules by size. In an additional specific embodiment, the size separation is by gel fractionation. In another specific embodiment, the method further comprises a step of subjecting the size-separated nick translate molecules to an additional amplification step. In a specific embodiment, the selective amplification step is by suppression PCR. In an additional specific embodiment, the suppression PCR utilizes a primer comprising a nucleic acid sequence for a primer specific for an adaptor sequence of the nick translate molecule; and nucleic acid sequence complementary to a region in a plurality of nick translate molecules, whereby the nucleic acid sequence is 5xe2x80x2 to the sequence for a primer specific for an adaptor sequence of the nick translate molecule.
In an object of the present invention, in the method the at least one nick translate molecule is amplified by primer extension/ligation reactions. In a further specific embodiment, the method further comprises immobilization of the nick translation molecules onto a solid support. In a specific embodiment, the solid support is a magnetic bead. In another specific embodiment, the primer extension/ligation reactions comprise initiating and extending the primer extension reaction with a first primer which is complementary to sequence in a subset of the plurality of nick translate molecules, wherein the complementary sequence of the nick translate molecule is adjacent to a first adaptor end of the nick translate molecule; and ligating an oligonucleotide to the 5xe2x80x2 end of the extension product, wherein the oligonucleotide comprises sequence complementary to the first adaptor of the nick translate molecule and also comprises a sequence for binding by a second primer, wherein the second primer binding sequence in the oligonucleotide is 5xe2x80x2 to the first adaptor complementary sequence in the oligonucleotide. In a further specific embodiment, the method further comprise amplifying the primer extended molecule. In another specific embodiment, the method further comprises separating the primer extended molecule from the plurality of nick translate molecule.
In an additional specific embodiment, the nick translate molecules were generated in the presence of dU nucleotides, the primer extended molecule contains no dU nucleotides, and wherein the separating step comprises degradation of the plurality of nick translate molecules by dU-glycosylase. In another specific embodiment, the amplification step comprises polymerase chain reaction using the second primer and a primer complementary to a second adaptor of the nick translate molecule. In a further specific embodiment, the ligation/primer extension reactions comprise ligating in a head-to-tail orientation a plurality of oligonucleotides to form an oligonucleotide assembly, wherein the oligonucleotides are complementary to nick translate molecule sequence adjacent to a first adaptor end of the nick translate molecule and wherein the nick translate molecule sequence is present in a subset of the plurality of nick translate molecules, wherein the nick translation molecule has the first adaptor on one terminal end and a second adaptor on the other terminal end; initiating and extending the primer extension reaction with the 3xe2x80x2 end of the oligonucleotide assembly; and ligating an oligonucleotide to the 5xe2x80x2 end of the extension product, wherein the oligonucleotide comprises sequence complementary to the first adaptor of the nick translate molecule and also comprises sequence for binding by a first primer, wherein the first primer binding sequence is 5xe2x80x2 to the first adaptor complementary sequence in the oligonucleotide.
In another specific embodiment, the method further comprises the steps of separating the primer extended molecule from the plurality of nick translate molecules; and amplifying the primer extended molecule. In an additional specific embodiment, the nick translate molecules were generated in the presence of dU nucleotides, the primer extended molecule contains no dU nucleotides, and wherein the separating step comprises degradation of the plurality of nick translate molecules by dU-glycosylase. In another specific embodiment, the amplification step comprises polymerase chain reaction using the first primer and a second primer complementary to the second adaptor of the nick translate molecule. In an additional specific embodiment, the primer extension/ligation reaction comprises initiating and extending the primer extension reaction with a first primer which is complementary to sequence in a subset of the plurality of nick translate molecules, wherein the nick translate molecule sequence is adjacent to a first adaptor end of the nick translate molecule; and ligating an oligonucleotide to the 5xe2x80x2 end of the extension product, wherein the oligonucleotide comprises sequence complementary to the first adaptor of the nick translate molecule; sequence for binding by a second primer, wherein the second primer binding sequence is 5xe2x80x2 to the sequence in (1); and a label at the 5xe2x80x2 end.
In an additional specific embodiment, the method further comprises the steps of separating the primer extended molecule from the plurality of nick translate molecules by the label of the oligonucleotide; and amplifying the primer extended molecule.
In a specific embodiment, the label is biotin. In another specific embodiment, the separation further comprises streptavidin-coated magnetic beads. In a further specific embodiment, the amplification step comprises polymerase chain reaction using the second primer and a third primer complementary to a second adaptor of the nick translate molecule.
In an additional object of the present invention there is a method of sequencing nucleic acid, comprising the steps of obtaining a DNA sample comprising DNA molecules having a region comprising a known nucleic acid sequence; partially cleaving the DNA molecules with a sequence-specific endonuclease to generate a plurality of DNA ends; separating the cleaved DNA molecules; generating a first amplifiable nick translation product, wherein the first amplifiable nick translation product comprises an adaptor at each end, wherein the nick translation of said first amplifiable nick translation product initiates from a known nucleic acid sequence; determining at least a partial sequence from said first nick translation product; and generating one or more additional amplifiable nick translation products, wherein said nick translation of said one or more additional amplifiable nick translation products initiates from the partial sequence of a previous nick translation product; and sequencing the nick translation products, wherein the amplified nick translation product is not subjected to cloning prior to the sequencing reaction. In a specific embodiment, the DNA sample is a genome. In another specific embodiment, there is a limited amount of DNA sample. In an additional specific embodiment, the amplification is by polymerase chain reaction, and one of the primers for the polymerase chain reaction is used as a primer for the sequencing reaction. In a further specific embodiment, at least a portion of the adaptor sequence is removed from the amplified nick translation molecule. In another specific embodiment, the removal step comprises subjecting the amplified nick translation molecule to a 5xe2x80x2 exonuclease. In an additional specific embodiment, a region of the adaptor sequence of the nick translate molecule comprises a dU nucleotide and the removal comprises degradation by dU-glycosylase. In a further specific embodiment, a region of the adaptor sequence comprises a ribonucleotide and the removal comprises degradation by alkaline hydrolysis. In an another specific embodiment, the region of the second adaptor sequence is in a 3xe2x80x2 region of the second adaptor sequence.
In an additional object of the present invention, there is a method of providing sequence for a gap in a genome sequence, comprising the steps of obtaining a DNA sample of the genome comprising DNA molecules having a region comprising a known nucleic acid sequence adjacent to the gap; digesting the DNA molecules with a plurality of sequence-specific endonucleases to generate a plurality of DNA ends; generating a first amplifiable nick translation product, wherein said nick translation of said first amplifiable nick translation product initiates from the known nucleic acid sequence; determining at least a partial sequence from said first nick translation product; and generating one or more additional amplifiable nick translation products, wherein said nick translation of said one or more amplifiable nick translation products initiates from the partial sequence of a previous nick translation product, wherein at least one of the amplifiable nick translation products comprises sequence of the gap. In a specific embodiment, the genome is a bacterial genome. In a specific embodiment, the genome is a plant genome. In a specific embodiment, the genome is an animal genome. In a specific embodiment, the animal genome is a human genome. In an additional specific embodiment, the bacteria are unculturable. In an additional specific embodiment, the bacteria is present in a plurality of bacteria.
In an additional object of the present invention, there is a method of producing a library of consecutive overlapping series of nucleic acid sequences from a DNA sample, comprising the steps of obtaining the DNA sample comprising a DNA molecule; digesting the DNA molecule with a first sequence-specific endonuclease to generate a plurality of DNA fragments, wherein at least one DNA fragment has a region comprising a known nucleic acid sequence; attaching a first adaptor molecule to ends of the DNA fragments to provide a nick translation initiation site, wherein the first adaptor comprises a label; subjecting the first adaptor-bound DNA fragment to nick translation comprising DNA polymerization and 5xe2x80x2-3xe2x80x2 exonuclease activity, wherein the nick translation initiates from the known nucleic acid sequence, to generate a first nick translation product; isolating the nick translation product by the label; attaching a second adaptor molecule to the first nick translate product; determining at least a partial sequence from the first nick translation product; and generating one or more additional amplifiable nick translation products, wherein said nick translation of said one or more amplifiable nick translation products initiates from the partial sequence of a previous nick translation product. In a specific embodiment, the label is biotin and the isolation step is binding to streptavidin-coated magnetic beads.
In another object of the present invention, there is a method of producing a library of consecutive overlapping series of nucleic acid sequences, comprising the steps of obtaining a DNA sample comprising DNA molecules having a region comprising a known nucleic acid sequence; partially cleaving the DNA molecules with a sequence-specific endonuclease to generate a plurality of DNA fragments, wherein at least one DNA fragment has a region comprising a known nucleic acid sequence; separating the cleaved DNA fragments; attaching a first adaptor molecule to ends of the DNA fragments to provide a nick translation initiation site, wherein the first adaptor comprises a label; subjecting the first adaptor-bound DNA fragment to nick translation comprising DNA polymerization and 5xe2x80x2-3xe2x80x2 exonuclease activity, wherein the nick translation initiates from the known nucleic acid sequence, to generate a first nick translation product; isolating the nick translation product by the label; attaching a second adaptor molecule to the first nick translate products; determining at least a partial sequence from said first nick translation product; and generating one or more additional amplifiable nick translation products, wherein said nick translation of said one or more amplifiable nick translation products initiates from the partial sequence of said first nick translation product. In a specific embodiment, the separation of the DNA fragments is by size. In another specific embodiment, the size separation is by electrophoresis.
In another object of the present invention, there is a library of consecutive overlapping series of nucleic acid sequences from a DNA sample, wherein the library is generated by the methods described herein.