Over the last three decades there has been an enormous increase in efficiency and corresponding decrease in cost of nucleic acid sequencing techniques. Traditional techniques for sequencing DNA are the dideoxy termination method of Sanger (Sanger et al., PNAS USA, 74: 5463 (1977)) and the Maxam-Gilbert chemical degradation method (Maxam and Gilbert, PNAS USA, 74: 560 (1977)). Both methods deliver four samples with each sample containing a family of DNA strands in which all strands terminate in the same nucleotide. Ultrathin slab gel electrophoresis, or more recently capillary array electrophoresis is used to resolve the different length strands and to determine the nucleotide sequence, either by differentially tagging the strands of each sample before electrophoresis to indicate the terminal nucleotide, or by running the samples in different lanes of the gel or in different capillaries.
The concept of sequencing DNA by synthesis without using electrophoresis was first described by Hyman, Analytical Biochemistry, 174: 423 (1988) and involves detecting the identity of each nucleotide as it is incorporated into the growing strand of DNA in polymerase reaction. Such a scheme coupled with the chip format and laser-induced fluorescent detection markedly increases the throughput of DNA sequencing projects.
More recently several different formats of so-called next generation and third generation sequencing methods have been described that can sequence millions of target templates in parallel. Such methods are particularly useful when the target nucleic acid is a heterogeneous mixture of variants, such as is often the case in a sample from a patient infected with a virus, such as HIV. Among the many advantages, sequencing variants in parallel provides a profile of drug resistant mutations in the sample, even drug mutations present in relatively minor proportions within the sample.
Although next generation and third generation sequencing methods are much more efficient than Sanger or Maxam-Gilbert sequencing methods in the amount of sequence generated in terms of time or dollars, they are also dependent on having high quality nucleic acids to sequence. The presence of impurities cannot only cause problems with sequencing reactions but in the case of contamination by non-target nucleic acids provides misinformation into the system that then complicates or even makes impossible a proper interpretation of the resulting data. Misinformation includes false positive signals, loss of robustness and sensitivity in the assay, and ambiguous results.