Genetic analysis is a key tool in biological research and is fast becoming an indispensable tool in the areas of pharmacology and even medical diagnostics. A wide variety of technologies, both old and new have been applied to such genetic analysis and particularly to the identification of nucleotide sequence analysis of larger fragments of genetic material.
However, as critical as raw genetic sequence data is in the overall analysis, by and large, it is analogous to a string of letters used in a written novel. While the order of the letters is critical, it is their context within words, sentences, paragraphs and chapters that convey the lion's share of the information that is of most use. Similarly, while pure nucleotide sequence information is critically important in genetic analyses, it is the context of that sequence information in codons, genes, gene clusters, chromosomes and whole genomes that conveys even greater amounts of information.
In addition to sequence context, most common sequencing techniques are based upon analysis of populations of nucleic acids, and therefore derive sequence consensus from the bulk analysis of mixtures of nucleic acids. While this method is effective at getting an overall consensus sequence, it overlooks the variations from molecule to molecule that may be particularly important for a variety of different applications. In contrast, single molecule sequencing methods may suffer from inaccuracies that are not apparent in bulk consensus methods.
The present invention is generally directed to processes and systems that provide redundant sequence information on individual nucleic acid molecules that can be used in enhancing accuracy determinations as well as determining sequence context information in sequencing processes. These and other aspects of the invention are set forth in greater detail below.