1. Field of the Invention
The present invention is generally directed to a method, system and article of manufacture for importing various clinical genomic data directly into a central database to enable the data to be accessed on-demand by queries.
2. Description of the Related Art
Databases are computerized information storage and retrieval systems. A relational database management system (RDBMS) is a computer database management system that uses relational techniques for storing and retrieving data. Relational databases are computerized information storage and retrieval systems in which data in the form of tables (formally denominated “relations”) are typically stored for use on disk drives or similar mass data stores. A “table” includes a set of rows (formally denominated “tuples” or “records”) spanning several columns (formally denominated “attributes”).
A RDBMS is structured to accept commands to store, retrieve and delete data using, for example, high-level query languages such as the Structured Query Language (SQL). The term “query” denominates a set of commands for retrieving data from a stored database. These queries may come from users, application programs, or remote systems (clients or peers). The query language requires the return of a particular data set in response to a particular query but the method of query execution (“Query Execution Plan”) employed by the RDBMS is not specified by the query. The method of query execution is typically called an execution plan, an access plan, or just “plan”. There are typically many different useful execution plans for any particular query, each of which returns the required data set. For large databases, the execution plan selected by the RDBMS to execute a query must provide the required data at a reasonable cost in time and hardware resources.
For the capture and processing of complex data from a plurality of different data sources, it is common to set up a staging data store and an operational database. The staging data store's function is to buffer related data from different data sources until a condition is satisfied, at which point the related data is processed and migrated from the staging data store to the operation database via a set of data transformations.
In a clinical genomics application, medical information from a variety of data sources for a given patient are stored in a staging data store (which may be referred to as the “Medical Information Gateway” or “MIG”). A given series of related data, called “events”, are grouped together into an “episode”. In one embodiment, an event in the MIG might contain lab work data, disease presentation data, or other crucial patient information. Once all events of a given episode are complete the system processes and imports the data into the operational database (the “Medical Information Repository” or “MIR”). Thus, the condition that triggers migration of the event data from the MIG to the MIR is the completion of the corresponding episode.
A problem arises with this arrangement when queries that require real-time data are run against the operational database. Because the affiliated data for a particular episode is not imported into the operation database until all associated events or steps are completed, data that could be critical to patient well-being may not be available in the operational database for queries. In other words, crucial patient data is not available to queries because all events in an episode are not yet complete and so the data has not been moved from the MIG into the MIR.
An existing solution to the problem has been obtained by using “sniffers” to analyze data within the MIG data store for specific conditions. Sniffers are computerized information analyzing and retrieval applications. Typically, a sniffer is created to locate data in a particular database or data store, following a very specific set of analysis rules and stored for use on disk drives or similar mass data stores. If the conditions are met, the sniffer fires actions according to its rule sets. Using sniffers to locate data in the staging data store is complicated by the fact that the staging data store contains different data types that are not all accessible by a single sniffer. As a result, a unique sniffer is needed for each type of data to be stored in the staging data store or MIG.
Accordingly, there is a need for a staged data environment in which related data pertaining to ongoing episodes can be accounted for in a query result in real-time.