The present invention relates to computer systems, and more particularly to the access of databases in the computer system.
Databases are an important tool for the storage and management of information for businesses. Both relational and non-relational databases exist for this purpose. Examples of relational databases include ORACLE, DB2, AND INFORMIX. Examples of non-relational databases include custom databases created with the 4690 operating system, developed by INTERNATIONAL BUSINESS MACHINES CORPORATION. The 4690 operating system allows programmers to created custom databases which support keyed, sequential, and binary file types. Database Management Systems (DBMS) provide users the capabilities of controlling read/write access, specifying report generation, and analyzing usage.
Some businesses, particularly large organizations, maintain multiple database types, both relational and non-relational. The simultaneous access to these databases then becomes an issue. For example, a company may wish to combine and compare customer preference information in the marketing department""s ORACLE database with the current customer issues tracked in the customer service department""s DB2 database. One conventional way of accessing these databases is through the DB2 DATAJOINER (xe2x80x9cDataJoinerxe2x80x9d) products, developed by INTERNATIONAL BUSINESS MACHINES CORPORATION. DataJoiner is a multi-database server which uses the Structured Query Language (SQL) and Open Database Connectivity (ODBC), to provide client access to diverse data sources that reside on different platforms.
FIG. 1 illustrates a computer system which uses DataJoiner. The computer system 100 comprises clients 102 in various platforms, a DataJoiner server 104, and data sources 106 of various types. The Data Joiner server 104 allows the clients 102 to transparently access data from the different data sources 106.
FIG. 2 is a block diagram illustrating the basic architecture of DataJoiner. DataJoiner 202 is a DBMS which includes ODBC software and drivers for a variety of data sources. DataJoiner 202 allows for transparent access to multiple data sources in a heterogeneous environment. The Generic Access Applications Programming Interface (API) 204 allows for customized access to home-grown or other data sources 208 not already supported by DataJoiner. Generic Data Access Modules 206 containing Custom ODBC compliant drivers for these data sources may be written and installed on the same system as DataJoiner 202 and used by DataJoiner 202 to access the data sources 208. The Generic Access API 204 defines the calls which must be provided by the custom ODBC drivers in order for DataJoiner 202 to access the custom data source 208.
However, DataJoiner is not able to provide certain features for access to the non-relational databases. In particular, DataJoiner 202 does not provide (1) a two-phase commit, (2) performance of SQL operations from within a single application across heterogeneous file systems, (3) referential integrity, and (4) indices for queries across heterogeneous file system which include non-relational data sources, such as 4690 keyed, sequential, and binary file systems.
Two-Phase Commit
DataJoiner can perform data changes across multiple databases sources within a single operation, while guaranteeing that changes to one source can be rolled back if changes to another source within the same transaction fail. The two-phase commit is the mechanism for implementing this. For example, someone may be trying to wire some money electronically to another location. A debit would need to be written against one account maintained by a first DBMS, and a credit would have to be reflected in another account maintained by a second DBMS. The two-phase commit protocol ensures that either all of the DBMS involved commit the changes required by the transaction or none do. However, keyed, sequential, and binary file systems cannot participate in this function.
SQL Operations from within a Single Application
Conventional relational DBMS""s employ two underlying data storage concepts in order to facilitate relational operations on that data: (1) data is stored in tables with each table having a fixed number of columns; and (2) system catalogs, or data dictionaries, are generated for each table. These data dictionaries (metadata) describe the column layout, index structure, and other data access information. These concepts allow one application to create data, and other applications totally unrelated to the first, to analyze and update the data.
However, with keyed, sequential, and binary file systems, there are several inhibitors that prevent these file systems from having the above flexibility. These include: (1) data is not held in a fixed table format; (2) data access is via a buffer that passes the entire record, with no public or obvious field delineation available; and (3) the structure, or field layout, of keyed, sequential, and binary records is not stored in a publicly available catalog. With these inhibitors, relational operations cannot be performed on the keyed, sequential, and binary file systems, making SQL operations from within a single application across heterogeneous file systems which include these file systems difficult or impossible.
Referential Integrity
Referential integrity is a technique where the consistency of data within a DBMS is ensured by referring to one piece of data during an operation on another piece of data. An example would be to refer to a department table to ensure that a valid department number is being used when a new employee is being added to the database. Referential integrity is incorporated in many of the conventional relational DBMS and is being enforced at the database level to ensure that all applications adhere to any constraints and thus ensure the integrity of the data.
However, the concept of referential integrity, and a file system mechanism to enforce it, does not exist for keyed, sequential, and binary file systems. Without this function, heterogeneous DBMS can not implement referential integrity between the relational and non-relational data. Using the above example, if the valid department list is in a keyed file system and the new employee data was to be stored in a relational database, there would be no way that the underlying data stores could enforce the referential integrity during the employee add operation.
Indices for Queries
A need often arises to access keyed and sequential data based on data selection criteria different from the key or record number traditionally used to access the data. Queries based on this type of selection criteria are referred to as ad-hoc queries. For instance, instead of retrieving an employee record by an employee number there may be a need to retrieve all of the employees who are employed in department xe2x80x9cXxe2x80x9d. The employee records are keyed by an employee identification number yet the selection criteria is based on the department number in which the employee is employed.
For relational databases, ad-hoc queries may be efficiently satisfied since indices of the data may be created for data elements other than the key or record number. Once created; these indices may be used as an alternative form of data access which provides a more efficient processing of the query.
However, to satisfy queries using conventional file access mechanisms available for hash-keyed or sequential files, every record in the file must be read and compared to the selection criteria since indices are not available for these files. This causes performance problems both in accessing the file and in creating the network traffic when the querying application resides on a machine other than the source data files.
Accordingly, there exists a need for a method and system for improved access to non-relational databases in a computer system. The method and system should increase the flexibility and efficiency of access to the databases. The present invention addresses such a need.
The method and system of the present invention provides improved access to databases in a computer system. The present invention includes a plurality of data sources, the data sources including at least one non-relational data source; a server coupled to the plurality of data sources, the server capable of providing client access to the plurality of data sources; and a store place data access layer (SPDAL) coupled to the server and the at least one non-relational data source, the SPDAL providing relational features to the at least one non-relational data source. In the preferred embodiment, the relational features provided include a two-phase commit process; a process for performing Structured Query Language (SQL) operations from within a single application; a referential integrity process; and a process for providing indices for non-relational data source files. These features allow the computer system to perform data changes across multiple database sources within a single operation; perform SQL operations from a single application across heterogeneous file systems; provide referential integrity of data in both relational and non-relational databases; and perform ad-hoc queries of data and other queries efficiently in non-relational databases. By adding these features, significant flexibility is added to the computer system.