Advances in biomolecule sequence determination, in particular with respect to nucleic acid and protein samples, has revolutionized the fields of cellular and molecular biology. Facilitated by the development of automated sequencing systems, it is now possible to sequence whole genomes. However, the quality of the sequence information must be carefully monitored, and may be compromised by many factors related to the biomolecule itself or the sequencing system used, including the composition of the biomolecule (e.g., base composition of a nucleic acid molecule), experimental and systematic noise, variations in observed signal strength, and differences in reaction efficiencies. As such, processes must be implemented to analyze and improve the quality of the data from such sequencing technologies.
One strategy for improving sequence data is to use redundant sequence data, e.g., as a means to improve accuracy by allowing random errors to be identified and resolved in the final sequence determination. However, the methods currently used are typically restrictive with regards to the types of polymorphisms that may be analyzed and the types of error models that can be used. As such, there is a need for a more powerful and flexible method for consensus sequence determination from redundant biomolecule sequence data.
Various embodiments and components of the present invention employ signal and data analysis techniques that are familiar in a number of technical fields. For clarity of description, details of known analysis techniques are not provided herein. These techniques are discussed in a number of available reference works, such as: R. B. Ash. Real Analysis and Probability. Academic Press, New York, 1972; D. T. Bertsekas and J. N. Tsitsiklis. Introduction to Probability. 2002; K. L. Chung. Markov Chains with Stationary Transition Probabilities, 1967; W. B. Davenport and W. L Root. An Introduction to the Theory of Random Signals and Noise. McGraw-Hill, New York, 1958; S. M. Kay, Fundamentals of Statistical Processing, Vols. 1-2, (Hardcover—1998); Monsoon H. Hayes, Statistical Digital Signal Processing and Modeling, 1996; Introduction to Statistical Signal Processing by R. M. Gray and L. D. Davisson; Modern Spectral Estimation: Theory and Application/Book and Disk (Prentice-Hall Signal Processing Series) by Steven M. Kay (Hardcover—January 1988); Modern Spectral Estimation: Theory and Application by Steven M. Kay (Paperback—March 1999); Spectral Analysis and Filter Theory in Applied Geophysics by Burkhard Buttkus (Hardcover—May 11, 2000); Spectral Analysis for Physical Applications by Donald B. Percival and Andrew T. Walden (Paperback—Jun. 25, 1993); Astronomical Image and Data Analysis (Astronomy and Astrophysics Library) by J. L. Starck and F. Murtagh (Hardcover—Sep. 25, 2006); Spectral Techniques In Proteomics by Daniel S. Sem (Hardcover—Mar. 30, 2007); Exploration and Analysis of DNA Microarray and Protein Array Data (Wiley Series in Probability and Statistics) by Dhammika Amaratunga and Javier Cabrera (Hardcover—Oct. 21, 2003).