The present invention relates generally to the field of genetic sequencing. More particularly, the invention relates to improved techniques for permitting automating sequencing of genetic materials by use of arrays of genetic fragments.
Genetic sequencing has become an increasingly important area of genetic research, promising future uses in diagnostic and other applications. In general, genetic sequencing consists of determining the order of nucleotides for a nucleic acid such as a fragment of RNA or DNA. Relatively short sequences are typically analyzed, and the resulting sequence information may be used in various bioinformatics methods to logically fit fragments together so as to reliably determine the sequence of much more extensive lengths of genetic material from which the fragments were derived. Automated, computer-based examination of characteristic fragments have been developed, and have been used more recently in genome mapping, identification of genes and their function, and so forth. However, existing techniques are highly time-intensive, and resulting genomic information is accordingly extremely costly.
A number of alternative sequencing techniques are presently under investigation and development. These include the use of microarrays of genetic material that can be manipulated so as to permit parallel detection of the ordering of nucleotides in a multitude of fragments of genetic material. The arrays typically include many sites formed or disposed on a substrate. Additional materials, typically single nucleotides or strands of nucleotides (oligonucleotides) are introduced and permitted or encouraged to bind to the template of genetic material to be sequenced. Sequence information may then be gathered by imaging the sites. In certain current techniques, for example, each nucleotide type is tagged with a fluorescent tag or dye that permits analysis of the nucleotide attached at a particular site to be determined by analysis of image data.
Although such techniques show promise for significantly improving throughput and reducing the cost of sequencing, further progress in speed, reliability and efficiency of data handling is needed.
For example, in certain sequencing approaches that use image data to evaluate individual sites, large volumes of image data may be produced during sequential cycles of sequencing. In systems relying upon sequencing by synthesis (SBS), for example, dozens of cycles may be employed for sequentially attaching nucleotides to individual sites. Images formed at each step result in a vast quantity of digital data representative of pixels in high-resolution images. These images are analyzed to determine what nucleotides have been added to each site at each cycle of the process. Other images may be employed to verify de-blocking and similar steps in the operations.
The image data is important for determining the proper sequence data for each individual site. However, the quantity of image data will become unwieldy as systems become capable of more rapid and large-scale sequencing. There is need, therefore, for improved techniques in the management of such data during and after the sequencing process.