The advent of powerful servers, large-scale data storage and other information infrastructure has spurred the development of advanced data warehousing and data mining applications. Standard query language (SQL) engines, on-line analytical processing (OLAP) databases and inexpensive large disk arrays have for instance been harnessed in financial, scientific, medical and other fields to capture and analyze vast streams of transactional, experimental and other data. The mining of that data can reveal sales trends, weather patterns, disease epidemiology and other patterns not evident from more limited or smaller-scale analysis.
In the case of medical data management, the task of receiving, conditioning and analyzing large quantities of clinical information is particularly challenging. The sources of medical data, for instance, may include diverse or various independent hospitals, laboratories, research or other facilities, each of which may generate data records at different times and in widely varying formats. Those various data records may be pre-sorted or pre-processed to include different relationships between different fields of that data, based upon different assumptions or database requirements. When received in a large-scale data warehouse, the aggregation of all such differing data points may be difficult to store in a physically or logically consistent structure. Data records may for instance contain different numbers or types of fields, which may have to be conformed to a standard format for warehousing and searching.
Even when conditioned and stored, that aggregation of clinical data may prove difficult to analyze or mine for the most clinically relevant or other data, such as those indicating a disease outbreak or adverse reactions to drugs or other treatments. That is in part because the data ultimately stored or accessed for reports may only contain or permit relationships between various parts of the data which are defined at either the beginning or end of the data management process. That is, the data may reflect only those relationships between different fields or other portions of the data which are defined and embedded by or in the original data source, or which an end user explicitly requests in a query for purposes of generating a report.
Relying on source-grouped data is a rigid approach which may neglect or omit desired relationships, while relying on manual or back-end queries may tax the OLAP or other query engine being used, and in any event may change the fundamental limitations of the data store and searchable structures. Moreover, in the case of moderate to larger-scale clinical data marts, thousands of variables, dimensions, attributes and other objects may obscure potentially useful clinical relationships in the complex of data. If those relationships are not defined by original data structures or probed by individual queries or reports, the information encoded in those correlations may never come to light. Those types of undiscovered relationships may represent lost clinical or operational opportunities, such as detection of promising treatment regimes or potential cost savings in a given service line. Other challenges in receiving, storing and analyzing large-scale medical and other data exist.