In the modern field of recombinant DNA technology, DNA strands are conventionally manipulated and reproduced by being inserted into larger pieces of DNA referred to as vectors. Such vectors serve as carrying vehicles both to transport the desired DNA from host to host, and to facilitate making multiple copies of DNA, i.e. clones, for various biological or analytical procedures.
One of the common objectives of modern biology is to determine the sequence of nucleotides in large DNA segments or genes. DNA sequencing technology is evolving to a point wherein it is being considered that efforts are being undertaken to determine the genetic code, or DNA sequence, of entire organisms. Such efforts are large in scale, even for relatively uncomplex organisms, and are massive in scale if considered for complex organisms such as vertebrates. The scale of such efforts mandates consideration of all possibilities of increasing the efficiency of DNA sequencing techniques.
Several techniques have been developed, therefore, to perform the actual base-to-base DNA sequencing. One commonly used method of DNA sequencing is referred to as dideoxynucleotide sequencing. In accordance with this technique, a single strand of DNA is sequenced through the use of a polymerase which elongates a complementary DNA strand from a radio-labelled primer along the single strand. Sets of non-natural nucleotide analogs, i.e. dideoxynucleotides, are incorporated into the media in which the polymerase is proceeding, and the analogs are selected so that when incorporated into the DNA molecule of the complementary strand, they terminate extension of the strand. Once the double-stranded DNA molecules are then denatured, the experimental vessel will then have in it a series of complementary nucleotide fragments each terminated by a dideoxynucleotide base. By then separating the nucleotides by length, usually by means of a gel electrophoresis technique, the length of the various fragments can be determined. If four of the experimental samples are processed at the same time, and each mixture normally includes a nucleotide analog for a different one of the four possible normal nucleotide base pairs, the four mixtures will each contain a set of fragments of differing lengths, and the relative position of each nucleotide within the original DNA fragment can be determined by comparative analysis of the lengths of the fragments from the four broths.
Thus to use the dideoxy nucleotide sequencing methodology, it is most appropriate to generate single-stranded DNA. However in most bacterial hosts, DNA normally exists in a double-stranded form, in circular fragments of DNA known as plasmids. Plasmids are double-stranded circles of DNA which can, if provided with a suitable origin of replication, replicate themselves when hosted in a cytoplasm of a suitable bacterial host. In many recombinant DNA procedures, the common intestinal bacteria E.coli is used as the appropriate host.
A variation on the dideoxynucleotide method of DNA sequencing makes use of the vector derived from the bacteriophage M13. The M13 phage vector is a single-stranded vector often used for sequencing. DNA of unknown sequence can be inserted into the M13 in its double-stranded form, i.e. replicating form, which replicates in host E.coli cells. The M13 phage genes then cause the E.coli cell to package single-stranded copies of the replicated DNA (the "+" strand) into phage particles which are secreted by the E.coli cells, which continue to grow. The single stranded DNA can readily be separated from the phage to isolate the large amounts of single-stranded DNA necessary for dideoxynucleotide sequencing. By utilizing oligonucleotide fragments as primers which hybridize to sites on the M13 vector adjacent to which unknown DNA has been inserted, and through elongation of the primers with polymerase, the unknown DNA can be efficiently sequenced through the use of the M13 phage vector. Utilizing this process, the single-stranded DNA will always be directionally uniform, that is to say the reading direction of the nucleotide sequence will always be from the hybridization site of the oligonucleotide primer through the unknown DNA in the same direction.
In efforts to sequence large segments of DNA, such as fragments of chromosomes or large genes, the orderly sequencing of small DNA fragments from one end to the other end of a large segment has, in the opinion of some, not proven to be the most efficient methodology for sequencing. Some scientists believe that it works out procedurally to be more practical to utilize random cutting of DNA fragments, and then to sequence large numbers of the random cuttings. The various sequence fragments thus determined can then be assembled by computer matching of the overlapping portions of the fragments. To ensure that all, or at least a very large portion, of the DNA in the large segment is accurately sequenced using this random fragment approach, a significant redundancy has to be built into the procedure. If the accuracy goal of a DNA sequencing process is such that, if a methodological end-to-end sequencing was done one would want to do it at least four times to get absolute accuracy, utilizing a purely random approach would require 22 fold over-sequencing of random DNA cuttings to provide a statistical likelihood of achieving the necessary sequencing of each individual base pair at least four times. Over-sequencing to this degree would obviously be an inefficient use of resources, if avoidable. It is therefore appropriate to search for techniques which allow the convenience of random sequencing of small fragments, and computer assembly the resulting fragments, while minimizing the amount of over-sequencing necessary to fill in all the gaps, and make sure that every single base pair is sequenced at least a certain number of times. To do this it would be extremely helpful if certain fragments could be sequenced in each direction. While it is possible to take each DNA fragment and make random inserts into the M13 phage vector, and then to isolate pairs of the M13 vectors which have the same insert in each of two opposite directions, this procedure is also relatively time consuming and laborious for use in mass DNA sequencing operations. Accordingly, a helpful procedure for automatically and easily changing the direction or orientation of DNA for DNA sequencing procedures.