1. Field of the Invention
The invention described herein relates to extracting data from a federated database system, that is, from a meta-database management system which transparently integrates multiple autonomous database systems into a single virtual database, that is, a federated database. The constituent database systems remain autonomous, separate, and distinct. The method, system, and program product described herein are directed to searching for data stored in a federated, distributed computer system in the presence of imperfections (in the data, data accessibility, user visibility and consistency) and to the management of the distributed database, including the database data and file access and retrieval, and retrieval of database data and files from a federated database, and exception handling.
2. Background Art
A federated database system is a type of meta-database management system (DBMS) which transparently integrates separate, distinct, multiple autonomous database systems into a single federated database. The constituent databases are interconnected via computer networks, the internet, local area networks, and virtual networks and may be geographically decentralized. Since the constituent database systems remain autonomous, a federated database system is an alternative to the non-trivial task of merging together several disparate databases.
Through data abstraction, wrapper functions, and container functions, federated database systems can provide a uniform front-end user interface, enabling users to store and retrieve data in multiple databases with a single query, even if the constituent databases are heterogeneous. To this end, a federated database system must be able to deconstruct the query into subqueries for submission to the relevant constituent DBMS's after which the system must consolidate or aggregate the result sets of the subqueries.
Because various database management systems employ different query languages, and may be characterized by different schema, metadata, locking processes and protocols during database operations, and user visibility and access tools, federated database systems must frequently apply wrappers to the subqueries to translate them into the appropriate query languages.
Federated databases have heretofore been variously predicted to be capable of solving a myriad of problems at the conceptual level. However, for real world, practical problems, federated databases have not lived up to the predictions. One particularly vexing set of challenges is obtaining optimal solutions for multi-dimensional physical and “people” challenges. This is especially true within the context of diagnoses, protocol, treatments, morbidity and mortality outcomes, along with almost infinite sets of drug interactions, immunological responses, and susceptibilities.
Especially in the case of federated, distributed hybrid queries, federated databases are an effective way of extracting meaningful data from pluralities of databases that have unified by ETL or other means. However, queries are often problematic. For example, when one of the queried databases is down or locked or otherwise nonresponsive, the queries can fail ungracefully, especially after consuming significant resources.
One problem with federated databases is that issues presented in databases per se are magnified by federation. For example, there is a clear need to verify the data that is accessed, that is, to verify that the accessed data is “for real.” This means verifying that the metadata is consistent and that the metadata constraints have retained their validity. This is because the system itself may not be static, the system can change, giving a “dirty read” instead of a “clean read.” Thus, a need exists to make sure that the accessed data is in the system and available and accessible.
Other aspects of assuring the validity of data in a federated data based system accessing the data, and accessing the data, as well as determining the level of locking. Still other aspects of federated databases include access to the data and visibility of the user to the data, frequently down to the levels of individual files, schemas, namespace, columns, and rows. Limitations and constraints on access and visibility may well give rise to false or misleading data.
In this context, exception handling and pre-polling processes determine how SQL queries are generated for everything from system level queries to granular row, column, and cell level queries.
Thus, a clear need exists to look at a problem beyond the database metadata level and the machine level, and to explore the solution space and associated soft constraints. By soft constraints we mean legal and institutional constraints, such as confidentiality and ethics, availability of people, performance requirements, and the like.