The use of computers for retrieval and analysis of data has become the standard in information-intensive industries such as finance and the sciences. As bodies of information have grown and become distributed into databases, a new set of disciplines (Informatics) aimed at studying the context of this data has been created.
Pharmaceutical and biotechnology companies place a high value on dexoxyribonucleic acid (DNA) and protein sequence information. For example, in 1996 a major pharmaceutical company earned revenues of $38 million based on subscription fees for the use of its sequence databases. Many pharmaceutical companies have large contracts and/or investments with gene discovery companies. However, since an unanalyzed DNA sequence has limited value, the outcome of gene discovery often hinges on bioinformatics--the application of computer technology to the analysis and management of sequence data.
Computer technology is essential to analyzing data such as DNA sequences, but today users of informatics related software find themselves in a dilemma. On one hand, the complexity of information makes the presentation of results crucial to understanding so the best informatics programs make use of interactive, graphical presentations. The best environment for understanding complex data relationships is a desktop computer running a graphical interface, but the computational demands of DNA sequence analysis require powerful workstations, or supercomputers. Such (often very expensive) computers do not support interactive graphical representations of analyses. At present there is no single program that performs all of the functions necessary for successfully analyzing DNA and protein sequences.
Although Web (i.e., network) technology allows users at a desktop computer to access programs and databases on remote computers, such programs lack a unifying standard. In particular, these programs have their own unique interface--program specific format for input and output. Ease of use is sacrificed, since users must learn to operate many different programs and must jump formidable technical hurdles to exchange data between these programs. As this often involves laborious and tedious manipulation of data files as well as detailed knowledge of the operations of programs and the quirks of each operating system, the chances of error are significant. Currently, scientists either spend unnecessary hours to accomplish tasks with these tools, or simply choose not to try, and potentially miss important observations.
In industrial fields, there are additional information management issues. Oftentimes, several researchers working in different offices in different states or countries have a need to share data results of tests and findings to maximize efficiency. Data management and analysis software, to date, has failed to fulfill this important need set, leaving the user to communicate his findings via post, E-mail, or informal verbal communication.
These situations particularly exist in such fields as bioinformatics and chemiinformatics, where users have a strong need for sophisticated manipulation of data, with interactive and accessible output. Additionally, these users have an identifiable need for real-time sharing of pertinent information across multi-functional teams.
One answer to this dilemma is the use of a client/server system, (i.e. software on a personal computer or workstation, running a graphical user interface (GUI), acting as a client of server software running on larger, faster machines). Data is stored on central machines, allowing easy access for everyone on a project team. However, for such a system to function smoothly, the clients and the servers must share communication protocols, so either the software developer must control both the client and server software or a common standard must be adopted. While client-server solutions have become increasingly popular, traditional client/server systems are deficient in several ways which have made them unsuitable as an effective software support for a rapidly changing field like drug discovery.
Conventional client/server systems tend to suffer from inherent inflexibility, due to the tight coupling of the client and server. To operate properly, the client software must "know" on what particular computer the server software runs, and the protocol with which to "talk" to the server. If the server machine is busy or down, the client software is unable to work, even if other machines are available that could process its request. Such software is not very "soft", as too many decisions are hardwired to it. If well designed, such systems can handle existing needs, but often need to be scrapped and totally rewritten if business needs change. In a rapidly changing field like bioinformatics, for example, the useful life of such software might be measured in months. Conventional client/server systems are, in addition, often very difficult to maintain and upgrade, since any changes made to the server requires complementary changes to the client. This situation is known as the "fat client" problem. For example, in a system that may have hundreds or even thousands of clients, even the slightest improvement in the server may lead to an enormous task for the system administrator in updating the improvement among the clients.
Furthermore, researchers in industry face significant security issues. Sequence data (that may have cost millions of dollars to collect) cannot be sent over the extremely public Internet where anyone might be listening. Consequently, many useful tools for sequence analysis (e.g., those provided over the Internet by the National Center for Biotechnology Information (NCBI), such as BLAST or Entrez) may be undesirable to use for researchers in industry due to the lack of security.
Drug discovery includes an almost parallel situation as mentioned above with respect to chemical data. Like Bioinformatics, there is no system currently available in the area of chemiinformatics which facilitates drug discovery without encountering many of the aforementioned deficiencies of conventional systems.
Thus, in light of the above problems associated with client/server systems and their applicability to Bioinformatics, Chemiinformatics and other data intensive industries, there is a strong need in the art for a system that overcomes these problems. In particular, there is a strong need for a system that provides for integrating and organizing biological and/or chemical data in order to facilitate drug discovery and design. Moreover, there is a strong need for a system that provides for a secure research environment that can be used by researchers in industry.