1.0 Field of the Invention
This invention relates to database management systems; and in particular, this invention relates to a common interface to access catalog information from heterogeneous databases.
2.0 Description of the Related Art
Database management systems allow large volumes of data to be stored and accessed efficiently and conveniently in a computer system. In a database management system, data is typically stored in database tables which organize the data into rows and columns. In a relational database management system, tables may be associated with each other. For example, the data in one column of a table may be used to refer to data in another table.
IBM® (Registered trademark of International Business Machines Corporation) Websphere® (Registered trademark of International Business Machines Corporation) Information Integrator, Standard edition, integrates diverse and distributed data sources in real time, regardless of where the data source resides. IBM Websphere Information Integrator, Standard edition is a federated relational database management system that allows data in relational and non-relational databases to be accessed using the Structured Query Language (SQL), and Open Database Connectivity (ODBC) and Java® (Registered trademark of Sun Microsystems, Inc.) Database Connectivity (JDBC) interfaces.
Database management systems typically store metadata which describes the data contained in the database in system catalog tables. This metadata comprises information such as schema names, table names, column names, column types and index definitions. Client applications, in addition to the database management system itself, may access this metadata. For example, data modeling tools, information asset management systems, data warehousing tools and reporting tools typically use a programming interface to retrieve a description of the data accessible from the database. One reporting tool programmatically interrogates the database to retrieve a list of tables that are available, as well as the column name and type information, to assist a user in order to dynamically build a SQL query. Each database vendor provides their own implementation of system catalog tables, typically with a proprietary naming scheme, format and available information.
Standards introduced by the American National Standards Institute (ANSI) Database Standards Committee and application programming interfaces (APIs) such as Open Database Connectivity and Java Database Connectivity provide well-defined interfaces to retrieve metadata information from the system catalogs. Application developers benefit from these standards and programming interfaces in order to build vendor-independent, portable database applications.
Enterprises today typically have multiple relational database products installed; and have multiple instances of a particular database product in their production environment. It is not unusual for larger information technology (IT) shops to have tens of thousands of databases from multiple vendors in use by production applications. For such complex environments, it is even more important to provide access to the metadata for each database. As the number and types of databases increases, it becomes increasingly difficult for the applications to provide consistent access to the metadata of all the database management systems. Retrieving metadata information across relational database management systems via a JDBC or ODBC interface is often a complex undertaking for an application because the application must manage a connection to each different source of data.
FIG. 1 depicts a block diagram of a conventional client application 10 which extracts and combines information from multiple source database management systems 12, 14 and 16, respectively. Each source database management system 12, 14 and 16, has a database, source database A 18, source database B 20 and source database C 22, respectively. Each source database 18, 20 and 22 also has associated source metadata, metadata A 24, metadata B 26 and metadata C 28, respectively. In this example, the metadata for each database has a different format. Each source database management system 12, 14 and 16 also has a different source metadata application programming interface (API) to access the source metadata, source metadata API A, 30, source metadata API B 32 and source metadata API C 34, respectively. For example, to access multiple database management systems 12, 14 and 16, the client application 10 loads a different driver for each of the database management systems 12, 14 and 16, maintains separate connections 36, 38 and 40 to each of the database management systems 12, 14 and 16, invokes the appropriate source metadata APIs 30, 32 and 34 to retrieve metadata information from each database management system, and interprets and combines the results 42, 44 and 46 from those source metadata APIs 30, 32 and 34 within the client application 10 itself. The source metadata APIs 30, 32 and 34 and the format of the result 42, 44 and 46 may differ between source database management systems, 12, 14 and 16, respectively.
FIG. 2 depicts a flowchart of a conventional technique which illustrates the complexity of retrieving metadata from different database management systems. Suppose the client application 10 wants to find all tables which contain a column whose name contains the pattern ‘%ID%’ across the three heterogeneous source database management systems 12, 14 and 16 of FIG. 1, where “%” is a wildcard character which is considered to match any character(s). The first source database management system 12 supports JDBC as its source metadata API; the second source database management system 14 supports ODBC as its source metadata API; and in the third source database management system 16, SQL is used to directly query its system catalog tables as its source metadata API. As shown in the flowchart of FIG. 2, the client application 10 must manage various connection formats and results. The client application 10 must connect to each source database management system 12, 14 and 16 (FIG. 1), understand the source metadata API for that source database management system, and then consolidate separate data results from the three source database management systems.
In step 50, the client application 10 connects to source database management system A 12 (FIG. 1) which contains source database A 18 (FIG. 1). In step 52, the client application 10 invokes source metadata API A 30 (FIG. 1) to find the column names containing the desired pattern. In response, source database management system A 12 (FIG. 1) returns result A 42 which has the format of source database management system A 12 (FIG. 1). In step 54, the client application 10 connects to source database management system B 14 (FIG. 1) which contains source database B 20 (FIG. 1). In step 56, the client application 10 invokes source metadata API B 32 (FIG. 1) to find the column names containing the desired pattern. In response, source database management system B 14 (FIG. 1) returns result B 44 which has the format of source database management system B 14 (FIG. 1). In step 58, the client application 10 connects to source database management system C 16 (FIG. 1) which contains source database C 22 (FIG. 1). In step 60, the client application 10 invokes source metadata API C 34 (FIG. 1) to find the column names containing the desired pattern. In response, source database management system C 16 (FIG. 1) returns result C 46 which has the format of source database management system C 16 (FIG. 1). In step 62, the client application 10 unifies the results 42, 44 and 46.
The source metadata APIs are bound to a single database connection and require the client application 10 to connect to each database management system and to invoke the specific source metadata API call for each database management system, that is, source metadata API A, source metadata API B and source metadata API C.
Some enterprise metadata management tools require access to metadata across the databases of an enterprise. These tools typically rely on custom-built adapters to retrieve metadata from different data sources, including relational database management systems, and use those adapters to populate a separate metadata catalog store on another server. Such solutions have some disadvantages. First, developing and maintaining such custom adapters for each source supported by these products is costly. New adapters must be provided for each new data source, and adapters must be updated each time the underlying interface or format changes. Second, maintaining a separate metadata catalog store introduces significant overhead to store, index, and synchronize with the original data sources.
Therefore there is a need for an improved technique to access metadata in a heterogeneous database environment. This technique should provide a common interface for the client applications to access metadata from multiple database management systems. This technique should also avoid maintaining a separate metadata catalog store.