A portion of the disclosure of this patent document contains material that is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever.
This invention relates to systems for managing distributed information included in a database and caches and provides techniques for improving efficiency of use of records which may be replicated. In particular an exemplary embodiment provides a method for synchronizing caches.
As background, reference is made to U.S. Pat. No. 5,615,362 issued Mar. 25, 1997 and related U.S. Pat. No. 5,706,506 issued Jan. 6, 1998 for: METHOD AND APPARATUS FOR MANAGING RELATIONAL DATA IN AN OBJECT CACHE, the content of which is incorporated by reference and made a part hereof.
In any database management system there are three substantive operations that affect the content of the database: create a record, update or modify a record, or delete a record. Proper execution of these operations helps to ensure that records in a database are current and correct. It is important that all copies of a record have correct information or will access the correct information upon the execution of a transaction. Hence, control of content is essential for data integrity, where multiple copies of the data record exist, such as in a computing network having a mass storage/central database and small, readily accessible caches. A cache is used to store data that is frequently used or is about to be used. Use of a cache reduces the number of times mass storage needs to be accessed and often provides faster access times than a central database. Hence, cache use can speed up access to database records.
As computing networks have become larger and more complex so have database management systems that control the flow of data across computing networks. Examples of such increasingly complex computing networks include, local area network, wide area networks, and the Internet. The implementation of multiple cache across such networks is now prevalent and is essential to ensure fast and accurate access to data. For example, Internet businesses may require the placement of multiple servers worldwide with each server possessing a cache. Use of multiple servers each having a cache provides faster and lower cost access to data for local Internet users, as a central database may be a world away. To allow users to exploit the advantages of local servers and local cache placement, database management systems should rapidly synchronize (shared and changed) data across such computing networks having multiple cache coupled to a central database.
Moreover, the maintenance of data integrity across a distribution of cache and a central database is desirous, as each cache may need to be synchronized with every other cache as well as be synchronized with a central database. Maintaining data integrity is complicated by multiple caches that independently receive synchronization requests from multiple caches. To this end, prioritizing and coordinating independently received commands between various caches and a central database is essential.
In known systems, cached data such as in the form of data objects retained integrity by requiring that the state of the object be frozen or locked while the data object was being used. Each transaction with a data object had to be cleared through a gatekeeper for the central database, and all updates required that the complete object be defined before a lock was removed. Thus only a limited number of users, often only one, could access a data object that was being used.
In an attempt to retain the ability to develop alternatives to locking, systems are known for manipulating portions of the data objects and communicating those portions of the objects that were changed to the central database. For example, in a data object having, employee name: Mary Jones, department ID: software development, and employee status: junior engineer, if employee status is changed from junior engineer to senior engineer, only changes to employee status would be communicated. Further, because only portions of the data objects were changed, updates to the central database had to occur in the order of the occurrence of the changes rather than in the order in which the report of changes occurred. Since order of receipt could not be controlled, there would be a resulting delay, and the processing required was complex to assure guaranteed delivery and proper ordering. Moreover, updates to various cache in a network of cache, also had to occur in the order of the occurrence of changes rather than in the order the report of changes occurred. Due to this requirement known systems are intolerant of loss of objects. As such, objects received out of order of occurrence of changes were accumulated until the accumulated objects were representative of the order of occurrence of changes. Only upon accumulation of objects representative of the order of occurrence of changes were objects committed to cache. Intolerant of losing synchronization messages (i.e., synchronization requests), known systems further required the accumulation of objects be in relatively slow hard memory (such as a magnetic memory storage devices), rather than in faster soft memory (such as a memory chip). Thus such known systems maintained a large overhead in the storage of data objects received out of order of occurrence of changes; the cost of which was relatively slow synchronization between caches.
What is needed is an information management system that does not have locked or frozen objects and can make updates regardless of the order in which reports are received. Moreover, an information management system is needed that synchronizes the full object state of data objects and thus is not tied to the use of relatively slow hard storage to guarantee synchronization of portions of data objects. The, system also needs to be faster, more efficient and tolerant of loss of data.
According to the invention, in an information management system operative on a plurality of networked computer systems having a central database providing storage of the latest confirmed version of data objects and a plurality of caches which can communicate with multiple clients, wherein the central database need not be locked, i.e., allowing concurrent access and changes, the information management system stores and provides state information for each data object in the central database, and a distributed cache management system controls individual cache objects (objects which have been retrieved from or inserted into the central database and locally cached) so they are selectively changed if messages are received at another cache in an expected order and selectively invalidated if messages are received at another cache with certain states indicative of error, i.e., if the objects are recognized as other than the confirmed latest version based on an xe2x80x9coptimisticxe2x80x9d control attribute and data in a cache synchronization decision table, thus causing reference to be made to the central database. In specific embodiments of the invention, each change to an object in the central database is assigned a unique version number with an inherent ordering known as its optimistic control attribute to serialize all changes, and the optimistic control attribute is used as a key to determine if messages have been lost or otherwise received at a cache out of order. Thus the optimistic control attribute serves as the local object clock under which changes in objects are reported to the central database so that decisions can be made at a local cache to process synchronization messages without unnecessary reference to the central database. In a further specific embodiment, full object state information is communicated among caches without need for verification through the central database. Thus if messages are lost or received out of order, the state can be applied to the targeted objects in the local cache assuring full synchronization.
As a result of the invention, individual caches can be informed of changes without having to constantly query the central database.
The invention will be better understood by reference to the following detailed description in connection with the accompanying drawings.