Today, various instruments, apparatus, and systems generate large volumes of data that require processing and analysis. Because a goal of many analytical techniques is high throughput and rapid analysis, not only should the analytical instrument operate efficiently to generate data, but the subsequent processing and analysis of the data also needs to be handled efficiently.
With respect to nucleic acid sequencing analysis, many known techniques rely on the use of exogenous labels and dyes to identify or recognize the incorporation of a nucleotide to a nucleic acid (polymer) or other chemical entity. However, such techniques can suffer inaccuracies, for example, where incorporation of a nucleotide with a label can be sterically hindered and suffer incomplete or inefficient incorporation. Consequently, techniques have been developed that can detect the natural by-products of transforming chemical reactions such as the incorporation of a natural nucleotide which produces a hydrogen ion. To that end, a sequencing instrument that is capable of electronically detecting nucleotide incorporation resulting from extension of a nucleic acid strand and can generate and output signals and data reflective of the relative hydrogen ion concentration associated with the nucleotide incorporation has been developed. See, e.g., U.S. patent application Ser. Nos. 12/002,291, 12/474,897, and 12/492,844, each of which is hereby incorporated by reference in its entirety for all purposes.
Accordingly, there is a need for further data analysis methods and systems that can efficiently process and analyze large volumes of data relating to nucleic acid sequence analysis and more particularly, to align or map nucleic acid fragments or sequences of various lengths. Further, there is a need for new data analysis methods and systems that can efficiently process data and signals indicative of electronically-detected chemical reactions, for example, nucleotide incorporation events, and transform these signals into other data and information, for example, base calls and nucleic acid sequence information and reads, which then can be aligned, for example, against a reference genome.