Sequencing-by-synthesis (SBS) generally refers to methods for determining the identity or sequence composition of one or more nucleotides in a nucleic acid sample, wherein the methods comprise the stepwise synthesis of a single strand of polynucleotide molecule complementary to a template nucleic acid molecule whose nucleotide sequence composition is to be determined. For example, SBS techniques typically operate by adding a single nucleic acid (also referred to as a nucleotide) species to a nascent polynucleotide molecule complementary to a nucleic acid species of a template molecule at a corresponding sequence position. The addition of the nucleic acid species to the nascent molecule is generally detected using a variety of methods known in the art that include, but are not limited to what are referred to as pyrosequencing which may include enzymatic or electronic (i.e. pH detection with ISFET or other related technology) detection strategies or fluorescent detection methods that in some embodiments may employ reversible terminators. Typically, the process is iterative until a complete (i.e. all sequence positions are represented) or desired sequence length complementary to the template is synthesized. Some examples of SBS techniques are described in U.S. Pat. Nos. 6,274,320, 7,211,390; 7,244,559; 7,264,929; and 7,335,762 each of which is hereby incorporated by reference herein in its entirety for all purposes.
In some embodiments of SBS, an oligonucleotide primer is designed to anneal to a predetermined, complementary position of the sample template molecule. The primer/template complex is presented with a nucleotide species in the presence of a nucleic acid polymerase enzyme. If the nucleotide species is complementary to the nucleic acid species corresponding to a sequence position on the sample template molecule that is directly adjacent to the 3′ end of the oligonucleotide primer, then the polymerase will extend the primer with the nucleotide species. Alternatively, in some embodiments the primer/template complex is presented with a plurality of nucleotide species of interest (typically A, G, C, and T) at once, and the nucleotide specie that is complementary at the corresponding sequence position on the sample template molecule directly adjacent to the 3′ end of the oligonucleotide primer is incorporated. As described above, incorporation of the nucleotide species can be detected by a variety of methods known in the art, e.g. by detecting the release of pyrophosphate (PPi) or Hydrogen (H+) enzymatically or electronically (examples described in U.S. Pat. Nos. 6,210,891; 6,258,568; and 6,828,100, each of which is hereby incorporated by reference herein in its entirety for all purposes), or via detectable labels bound to the nucleotides. In typical embodiments, unincorporated nucleotides are removed, for example by washing. In the embodiments where detectable labels are used, they will typically have to be inactivated (e.g. by chemical cleavage or photobleaching) prior to the following cycle of synthesis. The next sequence position in the template/polymerase complex can then be queried with another nucleotide species, or a plurality of nucleotide species of interest, as described above. Repeated cycles of nucleotide addition, primer extension, signal acquisition, and washing result in a determination of the nucleotide sequence of the template strand.
In typical embodiments of SBS, a large number or “clonal” population of substantially identical template molecules (e.g. 103, 104, 105, 106 or 107 molecules) are analyzed simultaneously in any one sequencing reaction, in order to achieve a signal which is strong enough for reliable detection. What is referred to as “homogeneous extension” of nascent molecules associated with substantially all template molecules in a population of a given reaction is required for low signal-to-noise ratios. The term “homogeneous extension”, as used herein, generally refers to the relationship or phase of the extension reaction where each member of a population of substantially identical template molecules described above are homogenously performing the same step in the reaction. For example, each extension reaction associated with the population of template molecules may be described as being in phase (also sometime referred to as phasic synchrony or phasic synchronism) with each other when they are performing the same reaction step at the same sequence position for each of the associated template molecules.
However those of ordinary skill in the related art will appreciate that a small fraction of template molecules in each population loses or falls out of phasic synchronism with the rest of the template molecules in the population (that is, the reactions associated with the fraction of template molecules either get ahead of, or fall behind, the other template molecules in the sequencing reaction on the population (some examples are described in Ronaghi, M. Pyrosequencing sheds light on DNA sequencing. Genome Res. 11, 3-11 (2001), which is hereby incorporated by reference herein in its entirety for all purposes). For example, the failure of the reaction to properly incorporate of one or more nucleotide species into one or more nascent molecules for extension of the sequence by one position results in each subsequent reaction being at a sequence position that is behind and out of phase with the sequence position of the rest of the population. This effect is referred to herein as “incomplete extension” (IE). Alternatively, the improper extension of a nascent molecule by incorporation of one or more nucleotide species in a sequence position that is ahead and out of phase with the sequence position of the rest of the population is referred to herein as “carry forward” (CF). The combined effects of CF and IE are referred to herein as CAFIE.
Those of ordinary skill will appreciate that a potential for both IE and CF errors may occur at each sequence position during an extension reaction and thus may have cumulative effects evident in the resulting sequence data. For example, the effects may become especially noticeable towards the end of a “sequence read”.
Further, IE and CF effects may impose an upper limit to the length of a template molecule that may be reliably sequenced (sometimes referred to as the “read length”) using SBS approaches, because the quality of the sequence data decreases as the read length increases.
Some embodiments of SBS have successfully applied numerical modeling and simulation methods to sequence data from SBS sequencing strategies to bioinformaticly correct the CAFIE error in the sequence data to extend the useable read length from a sequencing run. However, such methods are compensatory for the accumulated CAFIE error that is found in sequence reads from SBS sequencing strategies, and does not provide a mechanism for reducing the accumulation of CAFIE error during the sequencing run.
Embodiments of SBS as described herein serially introduce each nucleotide species individually into the sequencing reaction environment according to a predetermined order (also referred to as “flow order”, “flow pattern”, or “nucleotide dispensation order”). For example, an embodiment of SBS may employ a repeating cycle of a pre-determined order of 4 nucleotide species such as a TACG order of nucleotide species flows per cycle. In some embodiments the flow order may be repeated 200 to 400 times depending on application. However, in practice a flow order does not need to be a 4 nucleotide species cyclic repeat, such as TACG described above. In fact, some SBS applications have utilized customized flow orders which are tailored to the nucleotide sequences of an amplicon whose sequence are known a priori to maximize the number of incorporated bases that are extended by a minimum number of nucleotide species flows (i.e. have a very high extension rate by design). In the described amplicon-type flow order embodiments the flow order may be interpreted as a single flow order (i.e. non-cyclic) defined by the sequence composition of the amplicon sequence.
It is therefore desirable to extend the concepts of numerical CAFIE correction and customized flow order design and implement one or more flow orders that reduce the accumulation of CAFIE type error or can correct for some CAFIE error during a sequencing run. In other words, as opposed to applying the CAFIE correction methods to the sequencing data, the algorithms and modeling can be used to predict more optimal flow orders that reduce the accumulation of CAFIE error and/or correct some CAFIE error during the sequencing run.
A number of references are cited herein, the entire disclosures of which are incorporated herein, in their entirety, by reference for all purposes. Further, none of these references, regardless of how characterized above, is admitted as prior art to the invention of the subject matter claimed herein.