Historically, the analysis of genetic information has been carried out using chemical laboratory methods. While such chemical methods can provide adequate information for a limited number of gene sequences, computer-based research tools are increasingly being used for a variety of purposes related to genetic research. These computer-based tools include both hardware and software that perform high speed algorithms and other processes using gene sequence information stored as computer data.
These computerized tools typically include database programs of various types. In the conventional genomic database system a monolithic master program directs a series of algorithms to be performed. Such algorithms may include simple comparisons, such as to determine if a particular gene sequence has been previously identified, or may include more sophisticated mathematical analysis, such as those used to determine if a gene sequence is likely to exhibit certain characteristics. The algorithms may be run on either a database of existing full length sequences or on sequence segments prior to their entry into the database. Furthermore, researchers may wish to record annotations concerning the sequences as records in the database.
This database-oriented approach is to some degree the result of need to unify the efforts of many different individuals working under many different circumstances. For example, genetic analysis algorithms may be provided by academic, government, or commercial sources. Gene sequence data may originate from in-house laboratory tests, from external commercially available databases, or from privately developed sources.
These hardware and software devices have become powerful tools for the researcher in the field of genetics. In particular, once a comprehensive database of gene sequence information is available, many different algorithms may be run at high speed against all of the known gene sequences. The computer based methods thus provide results much more rapidly then if such analysis were carried out as chemical or laboratory experiments.
However, a number of problems occur when changes must be made to the sequence database or to the algorithms. This is especially the case in a real-time commercial environment, where new genetic sequence information and new algorithms are under continuous research and development.
One such problem occurs when a particular genetic sequence algorithm is replaced by a new version. When this happens, the old result records must be updated by running the new version of each algorithm against each of the genetic sequences. Not only may the process be time consuming, but also the database must typically be kept offline while it is being updated, in order to avoid losing track of which sequence records have yet to be updated.
Other problems exist if the database is not static. In particular, the addition of new gene sequence information to the database must be routinely accommodated in commercial environments. New sequence data may become available on a daily or even hourly basis as gene sequences are continuously produced by automated sequencing equipment. When new data is added, each of the existing algorithms must be rerun with the new data.
The problems associated with conventional systems include difficulties other than keeping the database of gene sequences synchronized with the latest versions of the analytical software. If, as in many cases, the software source code is organized to pre-annotate the data before it enters the database, as a first step in a pipeline of analytical algorithms or processes, it can be very difficult to add new tools or update existing ones. For example, if a change is made to the database while gene sequence data are already in the pipeline, then all new result records will reflect the update, but the old result records already in the database will still need to be updated to reflect the changes in the processing.
The solution to this has in the past been typically thought to require halting the processing pipeline long enough to make the required changes, and to then write new software to update the database of old result records to reflect the new process changes. Completing these procedures is a complex task, and is made even more complex by the interruptions in the availability of the database for other purposes. These problems are exacerbated as the number of records increases.