The advent of powerful servers, large-scale data storage and other information infrastructure has spurred the development of advanced data warehousing and data mining applications. Standard query language (SQL) engines, on-line analytical processing (OLAP) databases and inexpensive large disk arrays have for instance been harnessed in financial, scientific, medical and other fields to capture and analyze vast streams of transactional, experimental and other data. The mining of that data can reveal sales trends, weather patterns, disease epidemiology and other patterns not evident from more limited or smaller-scale analysis.
In the case of medical data management, the task of receiving, conditioning and analyzing large quantities of clinical information is particularly challenging. The sources of medical data, for instance, may include various independent hospitals, laboratories, research or other facilities, each of which may generate data records at different times and in widely varying formats. Those various data records may be pre-sorted or pre-processed to include different relationships between different fields of that data, based upon different assumptions or database requirements. When received in a large-scale data warehouse, the aggregation of all such differing data points may be difficult to store in a physically or logically consistent structure. Data records may for instance contain different numbers or types of fields, which may have to be conformed to a standard format for warehousing and searching.
Even when conditioned and stored, that aggregation of data may prove difficult to analyze or mine for the most clinically relevant or other data, such as those indicating a disease outbreak or adverse reactions to drugs or other treatments. That is in part because the data ultimately stored or accessed for reports may only contain or permit relationships between various parts of the data defined at either the beginning or end of the data management process. That is, the data may reflect only those relationships between different fields or other portions of the data which are defined and embedded by the original data source, or which an end user requests in a query for purposes of generating a report. Relying on source-grouped data is a rigid approach which may omit desired relationships, while relying on back-end queries may tax the OLAP or other query engine being used. Other challenges in receiving, storing and analyzing large-scale medical and other data exist.