The patent application of Donald P. Levine and David A. Egolf, entitled: A Method and Apparatus for Improving the Performance of a Database Management System Through a Central Cache Mechanism, bearing Ser. No. 08/999,248 filed on Dec. 29, 1997, assigned to the same assignee as named herein which issued on Jun. 6, 2000 as U.S. Pat. No. 6,073,129.
1. Field of Use
The present invention relates to data management systems and more particularly to searching mechanisms utilized by such systems.
2. Prior Art
As known in the art, to conserve space in a database, it becomes desirable that the stored data values not be unnecessarily duplicated. Therefore, in a relational database, instead of having one very large table to hold duplicate data values, the user generally creates several smaller tables that contain unique data values that are related to each other through common attributes. A user can retrieve data for a given set of criteria by xe2x80x9cjoiningxe2x80x9d the smaller tables to represent the large table. Data is extracted from the system using a query command that locates all rows in a table or tables that meet specified criteria. In such systems, one of the most expensive operations to conduct in terms of time and resources is the xe2x80x9cjoin operationxe2x80x9d which joins together two or more entire relations that can be fairly large. When multiple joins are present in a user generated query, as is the typical case, the cost of processing the query increases dramatically. Thus, the time expended in developing, optimizing and processing complex queries can be exceedingly costly in terms of time and resources. This is particularly true as more and more complex data types are required to be processed.
In the prior art, a number of systems utilize database query statements (in the form of SQL) that can be specified within programs written in a commercially available computer language, such as COBOL 85. The language runtime routines call a relational database management system (RDBMS) that processes the SQL statement, accesses the database to retrieve (or store) the desired data in the database and passes the data back to (or from) the requesting program. To improve system performance in the repeated use of SQL statements, the system generates a sequence of machine instructions that executes the logic of a particular SQL statement. Such sequence of machine instructions is referred to as xe2x80x9cgenerated codexe2x80x9d. The generated code is stored in a cache referred to as a xe2x80x9cSQL cachexe2x80x9d where it can be retrieved and used again on subsequent executions of the same SQL statement. This xe2x80x9ccachingxe2x80x9d eliminates the need to extensively process particular SQL statements more than once. In this system, when a SQL statement is encountered, a cache lookup operation is performed comparing the properties or attributes of the SQL statement and the datatypes involved with those stored in the cache. These attributes include the text of the SQL statement, the name of the cursor, if any, the number, types, and lengths of host variables referenced by the statement, the model name and the owner name. When a match is found in the cache, the associated generated code is executed to perform the actions prescribed by the SQL statement.
This approach has allowed the system in certain cases to bypass the steps of query optimization and code generation when the system encounters an identical SQL statement during the execution of a particular process. Since programs that access databases often repeatedly execute identical SQL statements, the use of an SQL cache with each process was been found to significantly improve performance. An example of a system that makes use of an SQL cache is the INTEREL product developed and marketed by Bull HN Information Systems Inc. For information concerning this product, and the SQL cache in particular, reference may be made to the publication entitled xe2x80x9cDatabase Products INTEREL Reference Manual INTEREL Performance Guidelinesxe2x80x9d, Copyright, 1996 by Bull HN Information Systems Inc., order no. LZ93 Rev01B and later revision Rev04 that is available at URL www.enterprise.bull.com/cd_doc.
While the above approach has increased performance, it still takes considerable time to process complex statements in those instances where the particular process being run has previously processed the same identical statement. That is, the attributes of the SQL statement are used to search the SQL cache for code that has already been generated for the particular statement. Therefore, in the case of complex SQL statements, substantial time still must be expended in searching for the applicable generated code.
Accordingly, it is a primary object of the present invention to provide a more efficient searching process and mechanism for improving system performance.
It is a further object of the present invention to provide a more extensive and flexible approach in processing complex statements.
The above objects are achieved in a preferred embodiment of the present invention through the use of a key memory structure used to locate applicable generated code within xe2x80x9ccachexe2x80x9d. The cache corresponds to the previously discussed SQL cache that stores generated code to execute the SQL statements as well as the structures and program logic used for maintaining the cache. The key memory structure is located in an area of memory associated with and utilized by the SQL runtime routines that are bound with the application program. This area of memory is located in the address space of the process executing the program that contains the SQL statements. This area is not directly accessible by the program. The runtime routines provide an interface between the program that contains the SQL statements and a relational database management system (RDBMS). The RDBMS contains a cache manager and the SQL cache. The cache manager contains the logic functions that are used to maintain and search the SQL cache.
In the preferred embodiment, the area of the memory in which a single key resides is referred to as a xe2x80x9ckey areaxe2x80x9d and the content of such key area when a key is stored therein is referred to a xe2x80x9ckey valuexe2x80x9d. An initialized key area is provided for each SQL statement embedded in the application program. The key area is in the area of memory associated with so-called xe2x80x9cstatic variablexe2x80x9d storage as is known in current language environments, that is, the content of such memory is retained between executions of the runtime routines and is automatically initialized at program invocation.
On the first execution of each SQL statement, the code generated for executing the particular statement is stored in the SQL cache. A xe2x80x9cnodexe2x80x9d or xe2x80x9cnode entryxe2x80x9d of a node structure having an array of entries, maintained by the cache manager, is populated with information associated with the statement as well as a pointer to the generated code segment. The cache manager then stores a key value associated with the generated code segment in the allocated key area.
During operation, the runtime routines pass a reference (e.g. address value) to the key area associated with the SQL statement across the runtime interface to the RDBMS software which, in turn, passes the reference to the cache manager along with such statement. The cache manager performs a search of the SQL cache in accordance with the present invention by using a first value within the key to access the node structure associated with the SQL cache to locate the node entry associated with a particular code segment. Then, the cache manager performs successive key validation tests with other values included in the key relative to values included in the allocated node. If the key is validated, that is, if this check confirms that the code is associated with the key, a cache hit is signaled and the search is complete.
More specifically, the validation check operates as follows. A xe2x80x9cnode idxe2x80x9d value in the key is used to access the particular node entry within the SQL cache structure. In the preferred embodiment, this is merely an array index, but other means of addressing could be used. Then a xe2x80x9ccode tokenxe2x80x9d value within the key is compared to the xe2x80x9ccode tokenxe2x80x9d value in the node entry. In the preferred embodiment, the xe2x80x9ccode tokenxe2x80x9d is the pointer to the generated code. In addition, a xe2x80x9ctimestampxe2x80x9d value in the key is compared to a xe2x80x9ctime insertedxe2x80x9d value within the node entry. In order for the key to be valid, that is, be a valid locator of the code associated with the node, the code token and time values from the key must each match their respective values in the node entry. If either set of values does not match, the key is deemed invalid and the cache search is not deemed to result in a xe2x80x9chitxe2x80x9d. A more extensive cache search must then be performed to determine whether the requested item is in the cache.
The reason for using and checking these values is that merely storing a pointer to the generated code in the storage associated with a program or its runtime routines is not sufficient for reliably locating code in the SQL cache. This is because SQL entries are subject to deletion and replacement and may be used by multiple programs.
The ability to pass a key to the cache manager on subsequent executions of a particular statement to access the potential associated entry in the SQL cache enables bypassing of much of the cache search mechanism resulting in faster cache xe2x80x9chitsxe2x80x9d. This in turn results in improved cache system performance.
The above objects and advantages of the present invention will be better understood from the following description when taken in conjunction with the accompanying drawings.