Nucleic acid sequence information is an important starting point for medical and academic research endeavors. The sequence information facilitates medical studies of active disease, genetic disease predispositions, and assists in rational design of drugs that target specific diseases. Sequence information is also the basis for genomic and evolutionary studies, and many genetic engineering applications. Reliable sequence information is also critical for paternity tests, criminal investigations, and forensic studies.
Nucleic acid sequence information was typically obtained using chain termination and size separation procedures, such as those described by Sanger, et al., (1977 Proc. Nat. Acad. Sci. USA 74:5463-5467). Prior to gel separation, the nucleic acid template molecules of interest are cloned, amplified, and isolated. Then the sequencing reactions are conducted in four separate reaction vessels, one for each nucleotide: A, G, C and T. These sequencing methods are adequate for read lengths of 500-1,000 nucleotides. However, they require template molecule amplification steps which are known to be error-prone which can jeopardize acquiring reliable sequence information. Furthermore, and these methods suffer from sequence-dependent artifacts including band compression of repetitive sequences and homo-polymeric regions during gel separation.
The technological advances in chemistry, automated sequencing machines, fluorescently-labeled nucleotides, and detection systems, have improved the read lengths, and permit massively parallel sequencing runs for high throughput methods which do not require gel separation.
Other recently developed techniques include single molecule sequencing methods which typically employ optical detection and resolution of signals from fluorescently-labeled nucleotides during polymerase-catalyzed nucleotide incorporation onto the extending nucleic acid strand. But these procedures suffer from inaccurate reads of regions containing highly repetitive sequences and homo-polymeric regions. Furthermore, read errors can be introduced which are caused by the incorporation of non-detectable nucleotides (e.g., unlabeled or attached to a non-fluorescent dye molecule).
The compositions, systems, and methods provided herein overcome many problems associated with current nucleotide incorporation procedures. The methods provided herein rely on nucleotide transient-binding to the polymerase, instead of nucleotide incorporation. For example, in one step, a labeled nucleotide transiently-binds the polymerase in a template-dependent manner, but does not incorporate. The transiently-bound nucleotide emits a signal which can be used to identify the nucleotide. The methods provided herein can provide accurate sequence information of repetitive and homo-polymeric regions. The methods provided herein are readily adaptable for use in existing single molecule and bulk sequencing platforms making it feasible to deliver sequence information as part of a healthcare or forensic program.