The rate of determining the sequence of the four nucleotides in DNA samples is a major technical obstacle for further advancement of molecular biology, medicine, and biotechnology. Nucleic acid sequencing methods which involve separation of DNA molecules in a gel have been in use since 1978. The only other proven method for sequencing nucleic acids is sequencing by hybridization (SBH).
The array-based approach of SBH does not require single base resolution in separation, degradation, synthesis or imaging of a DNA molecule. In the most commonly discussed variation of this method, using mismatch discriminative hybridization of short oligonucleotides K bases in length, lists of constituent K-mer oligonucleotides may be determined for target DNA. The sequence may be assembled through uniquely overlapping scored oligonucleotides.
In SBH sequence assembly, K-1 oligonucleotides which occur repeatedly in analyzed DNA fragments due to chance or biological reasons may be subject to special consideration. If there is no additional information, relatively small fragments of DNA may be fully assembled in as much as every base pair (bp) is read several times. In assembly of relatively longer fragments, ambiguities may arise due to repeated occurrence of a K-1 nucleotide. This problem does not exist if mutated or similar sequences have to be determined. Knowledge of one sequence may be used as a template to correctly assemble a similar one.
There are several approaches for sequencing by hybridization. In SBH Format 1, DNA samples are arrayed and labelled probes are hybridized with the samples. Replica membranes with the same sets of sample DNAs may be used for parallel scoring of several probes and/or probes may be multiplexed. Arraying and hybridization of DNA samples on the nylon membranes are well developed. Each array may be reused many times. Format 1 is especially efficient for batch processing large numbers of samples.
In SBH Format 2, probes are arrayed and a labelled DNA sample fragment is hybridized to the arrayed probes. In this case, the complete sequence of one fragment may be determined from simultaneous hybridization reactions with the arrayed probes. For sequencing other DNA fragments, the same oligonucleotide array may be reused. The arrays may be produced by spotting or in situ variant of Format 2, DNA anchors are arrayed and ligation is used to determine oligosequences present synthesis. Specific hybridization has been demonstrated. In a variant of Format 2, DNA anchors are arrayed and ligation is used to determine oligosequences present at the end of target DNA.
In Format 3, two sets of probes are used. One set may be in the form of arrays and another, labelled set is stored in multiwell plates. In this case, target DNA need not be labelled. Target DNA and one labelled probe are added to the arrayed set of probes. If one attached probe and one labelled probe both hybridize contiguously on the target DNA, they are covalently ligated, producing a sequence twice as long to be scored. The process allows for sequencing long DNA fragments, e.g. a complete bacterial genome, without DNA subcloning in smaller pieces.
In the present invention, SBH is applied to the efficient identification and sequencing one or more DNA samples in a short period of time. The procedure has many applications in DNA diagnostics, forensics, and gene mapping. It also may be used to identify mutations responsible for genetic disorders and other traits, to assess biodiversity and to produce many other types of data dependent on DNA sequence.