This invention concerns a multi-system, data sharing complex, and particularly concerns maintenance of the ability to detect errors occurring when data is written to secondary storage in a shared cache system.
In the prior art, detection of write errors occurring during storage of logical units of a database, such as pages, is provided by check bits which occur at the beginning and end of a logical unit. For example, consider FIG. 1 wherein a logical page 10 of data includes first and last bytes 12 and 14, respectively. The first bit 13 of the first byte 12 and the first two bits 15 and 16 of the last byte 14 are designated as "check bits" whose role is to support the detection of errors occurring when the page is written to secondary storage. In this regard, the technique used initializes the first bit 13 to a particular value and the first two bits 15 and 16 to a corresponding value. Since the bit 13 can be set to two values, the bits 15 and 16 are set to two different values corresponding to the 0 and 1 possibilities of the first bit 13. For example, assume the following correspondence: when the bit 13 is set to 0, the bits 15 and 16 are set to 10, and when the bit 13 is set to 1, the bits 15 and 16 are set to 11. Assume when DBMS wrote page 10 to disk first time ever, it set bit 13 to a particular value and bits 15 and 16 to the pattern which corresponds to the value of bit 13. Next, page 10 is read from secondary storage and entered into the buffer of, for example, a database management processor. If the page must be written back to the secondary storage because it was changed by the processor, the bits 13, 15 and 16 are "flipped" in that the bit 13 is set to its complementary value and the bits 15 and 16 are set to the associated pattern for that value. The check bits are flipped before the write operation and the bytes of the page 10 are written in first-byte to last-byte order to secondary storage. Subsequently, when the page is read from secondary storage, the reading system tests the relationship between the first and last byte of the page in, for example, a check circuit 18. If the relationship is the expected one described above, the page passes the test and it is assumed that the write to storage was error free. If it is not, the system infers that the last secondary storage write of the page was a partial one and, hence, there has been a data loss. In such a situation, the system recovers the page using a backup copy and log information. This technique is described in the article by Crus, et al entitled "Partial Data Page Write Detection", in the Technical Disclosure Bulletin, April, 1983, pp. 5589.
The write error detection procedure is practiced in systems limited to a single database management system (DBMS) which reads a page from a secondary storage device into a buffer on demand. A transaction updates the page in the buffer, and the DBMS writes the page back to storage sometime later. In this environment, the page state goes from "clean" to "dirty" with respect to the secondary storage upon the first update. Relatedly, when the page state goes from clean to dirty, the check bits are flipped, no update is allowed to the page while being written back to secondary storage, and after the storage write, the page is marked as clean so that the check bits can be altered on a subsequent update.
In a multi-system data sharing environment such as is described in the cross-referenced patent applications, a shared electronics store, hereinafter referred to as the "store", is a high-speed hardware assist for maintaining coherency of data among a plurality of DBMS's. The store is a "store-in" cache in that an updated page is written to the store first without immediately writing, not back to secondary storage. A DBMS can write an updated page to the store quickly and the page can be quickly refreshed in the store by other DBMS's.
The multi-DBMS architecture does not accommodate the write error detection technique as practiced in the prior art. The DBMS which reads a page from the store rather than from secondary storage may obtain a page which is already dirty with respect to secondary storage because it was updated by another system. To maintain the correct value of the check bits, the updating system must alter them only when the page changes state from clean to dirty with respect to secondary storage. Otherwise, an even number of updates made to the page by different systems would cause the check bits to be set to an incorrect value.
Further, in the multi-DBMS architecture, the system which returns the page to secondary storage may be different than one which dirties the page. The prior art write error detection technique is based on the assumption that the updating system is the one which returns the page to secondary storage. If this rule were followed in the multi-DBMS architecture, unacceptable overhead would result. The system returning the page to secondary storage would have to acquire a global lock on the page and would have to inform the other systems which have the page cached that the page state has changed from dirty to clean with respect to secondary storage. However, the return of a page to secondary storage in the multi-DBMS architecture in referenced U.S. application Ser. No. 07/627,315 contemplates a non-blocking serialization for removing a page from store.
Accordingly, in any multi-DBMS architecture in which the pages are returned to secondary storage from a shared store-in cache, there is a need to provide for the correct processing of check bits when a page may be returned to secondary storage by a system other than a system which updated the page.