The invention concerns caching data in a cache serving a multi-system data sharing complex. In particular, the invention concerns the caching of a data page by one database system into the shared cache in view of the possibility that another system could be trying to cache a later copy of the same page. This invention concerns the technique to detect such a condition and to bar entry of the non-updated page into the shared cache.
In a database system wherein a plurality of independently-operating computer systems share data, global locking is required to maintain coherency of data in the different systems. A. J. van de Goor, in COMPUTER ARCHITECTURE AND DESIGN, Addison Wesley, 1989, discusses the data coherency problem as one in which sharing data among a proliferation of processors raises the possibility that multiple, inconsistent copies of data may exist because of multiple paths to the data and because of opportunities to locally modify the data.
Solutions to the data coherency problem have been proposed. All are based essentially on the existence of a global lock on data retrieved from a central location. Assuming pagination of data, one computer system of a multi-computer system which shares data stored on a disk acquires a global lock on a page of data and obtains and updates the page. The lock signifies to the other computer systems that the page has been acquired for updating. Prior to releasing the lock on the page, the computer system holding the lock writes the page to the disk, after which it generates and sends a message to the other computer systems to invalidate any copies of the page which may be held in their local cache. The lock on the page is not released until acknowledgement is received from every other computer system having access to the page. This solution is described in detail in U.S. Pat. No. 4,399,504, which is assigned to the assignee of this patent application, and which is incorporated herein by reference. A commercial product available from the assignee of this application and which incorporates this solution is the IMS/VS (information management system/virtual storage) system with the data sharing feature.
The prior art global locking system provides great advantage in maintaining data coherency. However, the overhead penalties inherent in it include the requirement for performing an I/O (input/output) procedure when a page is updated and undertaking message exchange after the I/O procedure in order to notify the other systems and release the lock.
When used in a non-data-shared single system case, the prior art IBM (International Business Machines) IMS/VS product still incurs extra overhead in maintaining data coherency (consistency) between transactions by implementing a commit policy requiring each transaction which updates data to write the modified data, together with log records, to storage before the transaction is fully committed. This requires one I/O procedure per page for each modifying transaction, which increases overhead costs.
In contrast, the IBM DB2 in the single system, non-data-sharing case follows a policy which does not require an I/O process to write an updated page back to storage in order to commit a transaction. If the protocol described above is used in the IBM DB2 product in a data-sharing situation where a plurality of computer systems access one or more data storage sites, the performance could degrade significantly because of the required write back to storage and message delay. In this regard, see C. J. Date's discussion of concurrency at pages 593-595 in Vol. I of AN INTRODUCTION TO DATABASE SYSTEMS, Addison-Wessley, 1986.
In a multi-computer, data-sharing system which includes multiple levels of storage, it is contemplated that a first level of storage would consist of one or more direct access storage devices (DASD's) which are shared by independently-operating computer systems. Typical nomenclature for hierarchally-arranged storage systems classify DASD and other such storage facilities as "secondary" storage. In this regard, secondary storage includes all facilities from which data must be moved to "primary" storage before it can be directly referenced by a central processing unit (CPU). See Detiel, OPERATING SYSTEMS, Second Edition, 1990, by Addison Wesley, page 30. It is further contemplated that caching techniques would be useful to provide a high-speed, frequently-accessed storage for shared data. For various reasons, data would be entered into the shared cache by the database systems after acquisition from DASD's. In this regard, a shared cache would be included in a primary level of storage for a multi-computer, data-sharing system.
In such a structure, a potential hazard would exist if one computer system obtained a block of data from DASD for the purpose of caching it after the same block of data had been obtained, modified by another computer system and cached, but not yet returned to DASD. In this situation, the outdated block obtained from DASD is referred to as a "down-level" version of the updated block in cache. The challenge is to prevent the overwriting of the updated block by the down-level version without incurring the expense of locking the DASD version.
Typically, global locking protocols are used by a database system to serialize access to the record of interest in the data sharing case. The inventors contemplate that they would still be used. Also typically, there would be global locking on the page to serialize updates to the page from different database systems. The avoidance of serialization described in this invention is for inserting a down level page from the secondary storage into the shared cache by different database systems.