Genetic information, and the corresponding cellular and physiological information, is an extremely useful tool for a variety of uses. Comparative analysis of genetic information has been widely used in basic scientific studies, such as research into the molecular changes associated with disease, genetic differences in molecular evolution, and identification of individuals using forensic techniques. For instance, genetic information has been critical in determining the underlying molecular basis for a number of both heritable and sporadic cancers. These studies utilizing genetic information have allowed important advances in the medical field, providing mechanisms for prenatal diagnosis, identification of the presence or progression of disorders, and prognostic information on the aggressiveness of disease.
The ability to access genetic information quickly and efficiently is critical to the success of many of these scientific and medical uses. Currently, analysis of genetic and cellular information is generally done using molecular biology or biochemistry techniques in a laboratory setting. Although some of this research is computer aided, most analysis of such information is done by hand. Thus, the use of genetic and cellular information for scientific and medical purposes has practical limitations due to the quantities of human labor and time required for such analysis.
The state of computer technology governing the organization and use of genetic data has contributed to the limitations of the methods by which much scientific and medical analysis can be performed. Computerized tools for analyzing biological information are primarily targeted towards performing direct comparisons between sequences. Such techniques are very powerful in determining the relatedness of certain gene products with respect to other gene products, and may provide putative functions to novel gene products. Databases such as GenBank, for example, are widely used for such purposes. Databases such as GenBank are not, however, designed to efficiently perform more complex analysis such as abundance analysis between tissue types, subtractive analysis between samples of normal tissue and a tissue in a disease state, or similar comparative procedures. These tools to date have thus had a limited role in diagnostics, prognostics, and the optimization of patient treatment strategies.
Moreover, the majority of the databases used in biological and medical research are depository, i.e. sequences may be entered multiple times from different sources. Depository databases are not edited for accuracy; the mistakes that are present when the sequences are entered remain in the database files until the source of the sequence takes proactive steps either to remove or correct the information. For example GenBank, a widely used public gene-sequence database maintained by the National Center for Biotechnology Information, is a depository database. Sequences may be entered into GenBank from different researchers, and the information remains in the database until actively removed. An initial search that appears to show significant homology with a variety of sequences in GenBank may in fact be identifying multiple versions of the same gene sequence, with the each version merely having different sources and names. In a case where the sequences have minor variations from one another, depository databases do not provide any means by which to identify the correct sequence.
There is a need in the field for a computer-based system for efficiently analyzing and comparing genetic sequences and the corresponding cellular and physiological data. Such a system would greatly enhance the use of genetic information in the fields of medicine and biology. This would be especially beneficial in the area of patient care and treatment.