DNA (deoxyribonucleic acid) is the universal carrier of genetic information. DNA is an intertwined helix of two polymeric strands, each strand build-up of nucleotide units attached to a backbone of deoxyribose sugars and phosphate groups joined by ester bonds. These two strands run in opposite, anti-parallel directions. Each DNA strand is build-up of 4 nucleotides A, C, G and T, in a specific order for that DNA molecule. It is the sequence of these four bases along the backbone that encodes genetic information.
The two DNA strands have a complementary nature. An A nucleotide forms a base pair with a T nucleotide in the opposite strand, and vice versa; a G nucleotide forms a base pair with a C nucleotide in the opposite strand, and vice versa.
In eukaryotic cells, DNA is transcribed to RNA (ribonucleic acid). RNA molecules are rather similar to DNA molecules, the single chains of nucleotides are attached to a backbone of ribose sugars and phosphate. Depending on their function, there are different types of RNA molecules. mRNA molecules are used by cellular organisms to carry the genetic information encoded in DNA to direct synthesis of proteins. In some viruses, RNA is even used as the genetic code instead of DNA.
DNA can be replicated by DNA polymerases. A DNA polymerase can only extend an existing DNA strand paired with a template strand. It cannot begin the synthesis of a new strand as such. To begin synthesis, a short fragment of DNA, an oligonucleotide or RNA molecule, called a primer, must be created and paired with the template DNA strand.
DNA polymerase synthesizes a new strand of DNA by extending the 3′ end of an existing nucleotide chain, adding new nucleotides to the template strand one at a time through the creation of phosphodiester bonds. The incoming building blocks are the nucleoside triphosphates (dNTPs: dATP, dCTP, dGTP, dTTP). The oxygen of the 3′-hydroxyl end of the growing DNA strand makes a nucleophilic attack on the alpha phosphate (the one closest to the sugar) of the dNTP. The result is that the dNMP (deoxyribonucleoside monophosphate, or nucleotide) becomes covalently bound to the 3′ carbon of the sugar at the end of the DNA strand, thus lengthening the strand by one nucleotide. Moreover, pyrophosphate and a proton are released (FIG. 1). Then the process repeats.
There is a huge interest in determining genetic DNA information, such as the nucleotide found at a given position, the sequence order found at given locus/loci in the genome, or even the complete genome. Even RNA can be sequenced when it is first converted to cDNA. Genetic information is determined by sequencing technologies, such as Maxim-Gilbert sequencing, Sanger sequencing and derivatives thereof, parallel pyrosequencing (Roche 454 Life Sciences), reversible terminator-based sequencing by synthesis (Illumina), Sequencing by Oligonucleotide Ligation and Detection (SOLiD) (Life Technologies), Ion Semiconductor Sequencing (Ion Torrent, Life Technologies), Single Molecule Real Time sequencing (SMRT) based on zero-mode waveguides properties (Pacific Biosciences), nanopore sensing (Oxford Nanopore Technologies), etc. Depending on the sensitivity of many sequencing technologies, pools (clones) of identical DNA molecules are sequenced in parallel. Sequencing of single DNA strands in parallel is only possible by Single Molecule Real Time sequencing and nanopore sensing. In most sequencing technologies, a double-stranded DNA molecule is denatured, and one of these single-stranded DNA molecules is then sequenced. In essence, this single DNA strand is used as a template in a sequencing reaction for the synthesis of a second complementary DNA strand, based on the complementary nature of DNA. A new DNA strand can be synthesized when a small DNA fragment, an oligonucleotide, binds to the DNA template. This oligonucleotide is a primer for further extension of a new growing DNA strand by incorporation of nucleotides on the complementary principle. Such oligonucleotides are typically about 10-25 nucleotides long and can be easily synthesized. By monitoring the synthesis of this new DNA strand, i.e. the order in which nucleotides are incorporated in the new DNA strand, the DNA sequence of that strand can be determined. Given the complementary nature of the two DNA strands, the sequence of the other original DNA strand is then also known.
Despite the progress in sequencing techniques, the sequence of the number of nucleotides in homonucleotide stretches cannot be accurately determined with certain newer generation sequencing technologies. For example pyrosequencing was invented in the early nineties, highly parallel sequencing was introduced in 2005, while Ion Semiconductor sequencing was only introduced in 2009. The pitfall of inaccurate calling of homonucleotide stretches is already known for almost two decades.