This invention relates to databases, and more particularly to integrated multidimensional databases for managing scientific research data.
Researchers have performed experiments on and made observations of biological tissue samples suggesting that a molecular basis for cancer and other diseases might be discovered through careful molecular analysis of such tissues. Such an understanding could permit improvement in the diagnosis, screening, and treatment of disease, and could permit disease treatment to be tailored to the specific molecular defects found in an individual patient. Many different researchers and laboratories study the molecular basis of disease and a large amount of data and information is produced from such studies. Optimization of the handling and integration research results, data, and other information produce and used by various laboratories devoted to studying the molecular basis of cancer and other tissue-based diseases is advantageous for realizing improvements in the understanding and treatment of disease.
Even though many large genomic warehouse databases currently exist, and even though scientific laboratories are connected to the Internet, the data produced by a lab are not necessarily well handled, integrated, validated, searchable, and useable either by the lab producing the data or by another lab that might be interested in using the data. Generally, when data from biological tissue studies are published, only a limited set of the actual primary data (and sometimes none of it) are available for review and reanalysis. Moreover, common language and reference points are often not used for reporting the data. Even in the lab that did the original work, there is often no efficient or robust way to integrate data from a study with previous or subsequent studies. Furthermore, because of space limitations and the difficulty of tracking complex research methods, many published descriptions of laboratory methods do not provide adequate information for another scientist to accurately reproduce an experiment, even though this is a central tenet of scientific publication. The end result is that when taken as a group, the many similar or related studies, while individually illuminating, are isolated and autonomous from each other, and do not achieve potential synergies.
Poor data handling may result in major problems that may slow or possibly prevent real progress in finding better treatment and diagnostic methods for major diseases. In particular, current methods of disseminating information from molecular studies of cancer and other diseases do not allow results from one study to be easily integrated with results from other studies. There is no standard way to link the results of DNA, RNA, and protein-based studies to cellular function or phenotype expression. Current methods of dissemination of the results of molecular studies do not allow preservation of a substantial portion of the original data supporting such studies, making it difficult for researchers to verify the conclusion of a research study or otherwise reinterpret the data.
In one aspect, generally, a method of distributing research data from a common database to a user of the common database is provided. Data concerning research results and data upon which the research results are based are stored in a local database, with research results linked to the data upon which the research results are based. Data concerning research results and data upon which the research results are based are selectively extracted from the local database to the common database. Research data are then selected by a user of the common database from the extracted data concerning research results and from the data upon which the extracted data are based and the selected research data are distributed to the user.
Implementations may include one or more of the following features. For example, when the research data are distributed, the data concerning research results and the data upon which the research results are based are distributed in a defined database table structure. The distribution of research data may include giving a reviewer electronic access to the data concerning research results and to the data upon which the research results are based. The approval of the reviewer may be required before the research data are publicly distributed.
The data upon which the research results are based may include phenotype data and genotype data. The data upon which the research results are based can include information concerning equipment and supplies used in generating the research results, or information concerning biomaterials used in generating the research results.
Information concerning protocols used in generating the research results may be stored in the local database. The information concerning protocols used in generating the research results may be linked to the data concerning research results and to the data upon which the research results are based. Information concerning protocols used in generating the research results may be selectively extracted from the local database to the common database. Research data selected by a user of the common database from the information concerning protocols used in generating the research results may be distributed to the user.
The data upon which the research results are based may include information concerning equipment and supplies used in generating the research results, and may include information concerning biomaterial used in generating the research results.
In another general aspect, a system for distributing research data may include a processor, an output device for viewing the research data, and memory for storing instructions performed by the processor. The memory includes instructions for storing data concerning research results in a local database, storing data upon which the research results are based in the local database, and linking the data concerning research results to the data upon which the research results are based. The memory also includes instructions for selectively extracting data concerning research results and data upon which the research results are based from the local database to the common database and for distributing to a user of the common database research data selected by the user from the extracted data concerning research results and the data upon which the extracted data are based.
The memory of the system can also include instructions for storing information concerning protocols used in generating the research results in the local database, and for linking the information concerning protocols used in generating the research results to the data concerning research results and to the data upon which the research results are based. The memory can include instructions for selectively extracting information concerning protocols used in generating the research results from the local database to the common database, and for distributing to a user of the common database research data selected by the user from the information concerning protocols used in generating the research results.
In another general aspect, a computer program, residing on a computer-readable medium, for distributing research data includes instructions for causing a computer to store data concerning research results in a local database, store data upon which the research results are based in the local database, and link the data concerning research results to the data upon which the research results are based. The program includes instructions for selective extraction of data concerning research results and data upon which the research results are based from the local database to the common database, and for distributing to a user of the common database research data selected by the user from the extracted data concerning research results and the data upon which the extracted data are based.