Scientists and researchers working in genomics and proteomics fields have traditionally utilized databases containing DNA sequence information. These scientists and researchers increasingly work with data associated with all points along the lifecycle of the cellular transcription process. While many insights into the overall lifecycle of the cellular transcription process can be gained from gene expression data and other sequence information, understanding the overall state of a cell also requires information about protein expression, post-translation protein modifications such as protein folding considerations and glycosylation, and molecular interactions such as protein subunit arrangements.
As scientists increasingly are performing experiments to correlate gene data and protein data, there is an increasing need to compare protein characterization and measurement data with corresponding genetic data. Both gene and protein data are increasingly available to scientists, but no convenient method is available for correlating or mapping the different types of data. Association of the different types of data has until now been carried out laboriously by hand.
With the increasing use of high throughput technologies in molecular biology and related fields, it is becoming increasingly difficult for scientists and researchers to track and correlate gene and protein data. It also increasingly difficult to track and make reference to the growing number molecular identifiers used in association with different databases. This is due to both the quantity of data being handled, as well as the number and inconsistency of available identifier systems such as, for example, Genbank accession numbers, Unigene cluster identifiers, clone identifiers, RefSeq accession numbers, and other identification information. Effective association or correlation of important data by hand is no longer feasible.
One approach to providing biotechnology data has been provided by searchable web portals that allow manual location of information usable for tracing associations between data. Single HTML-based queries are used to interface curated sequence information with descriptive information. The available correlating and mapping systems, however, do not provide any standards-based services that can be used by scientists and researchers to easily build informatics architectures of correlated data from available databases.
There is accordingly a need for systems and methods that provide a high-performance, ubiquitous infrastructure or architecture for accessing and correlating gene information and related protein information. The present invention satisfies these needs, as well as others, and generally overcomes the deficiencies found in the background art.