1. Field of the Invention
The present invention relates to generally to data storage systems, and more particularly, but without limitation, to recovering meta-data in a cache memory (hereinafter cache data) after a corruption event.
2. Description of the Prior Art
Computer systems may include different resources that may be coupled to and used by one or more host processors. Resources and host processors may be interconnected by one or more communication connections. These resources may include, for example, data storage systems that provide storage services to each host processor. An example data storage system may include one or more data storage devices that are connected together and may be used to provide common data storage for one or more host processors in a computer system.
Data storage systems may also have cache memory connected to the data storage devices for storing frequently accessed data for rapid access. Typically, it is time-consuming to fetch or compute data stored in the data storage devices. However, once data are stored in the cache memory, future use could be made by accessing the cached copy rather than re-fetching or re-computing the original data, so that average access time to data may be made lower.
A cache memory may include a data area and meta-data area. The data area 325 is an area of cache memory 320 containing cache-slots for relatively temporary in-cache storage of data units. The data area provides relatively quick access to data units as compared to the operation of data storage devices 350, 355, and 360. The meta-data area stores meta-data, or information about data units stored in data storage devices. The meta-data are associated with data units that are stored in the data area or in other data storage devices, including logical volumes. When corruption occurs in the meta-data but not in the data area associated therewith, typically attempts are made to correct the corrupted meta-data. Upon occurrence of a corruption event, corruption may occur in all of or only portions of the meta-data area. Depending upon the extent of the damage, it may be necessary to bring the system off-line to make the corrections. Whether the data storage system remains on-line or is taken off-line, the meta-data recovery process starts by scrutinizing the meta-data area for indications of corruptions. If only a small amount of meta-data is identified as having been corrupted, the meta-data can be corrected in a conventional manner, for example by recreating the meta-data. If larger amounts of meta-data are corrupted, correcting the meta-data in a conventional way can result in the system being off-line for unacceptable amounts of time.
Many approaches have been developed for protecting critical data stored in a data storage system against loss resulting from power failures or transients, equipment malfunctions and other causes. In one approach, all of or selected portions of the stored data can be transferred to tape or other backup media thereby to backup the cache memory system by providing a “snapshot” of the cache memory system at the time of the backup. In the event of a data loss, the backup copy can then be used to restore the data to the operational digital data system. However, the time to complete such a backup may be extensive. It may also take a significant time to restore the information, particularly if a storage system, such as a disk drive, fails completely.
In data processing systems that require essentially full-time availability and that incorporate large memory systems, data restoration may involve providing backup power, such as batteries, to the data system so that, upon power loss, data stored in more volatile memory systems can be written onto storage devices such as disks involving less volatile data storage. Once power is restored, the memory tables can be rebuilt. However, when the batteries are deflected or have failed and the system has insufficient time or power to store the data onto storage devices such as disks, it may be necessary to recover the meta-data on an entry by entry basis.
In the past, the recovery process for meta-data following significant corruption events involved taking the system off-line to rebuild the meta-data for all of the table entries irrespective of whether the data units associated with them was “in-cache” or “out-of-cache”. The time that a system was off-line can have been extensive while the meta-data associated with a data unit that was not likely to be required by a user was being repaired. Co-pending U.S. patent application Ser. No. 11/563,450, entitled METHODS AND SYSTEMS FOR MANAGING CORRUPTED META-DATA IN A COMPUTER SYSTEM OR NETWORK, filed on even date herewith, discloses managing data repair by deferring validation and repair of corrupted meta-data until the first time an attempt is made to access the table entries with which the meta-data are associated. Using the invention therein, a computer system may return to being on-line more quickly than it would have been previously after a potential corruption event. There may be delays after a corruption event while critical meta-data are being repaired. However, over time the delay will be reduced until normal operating access is eventually restored.
Even with the use of the use of the invention described in the co-pending U.S. patent application described above, it is advantageous to provide for repair and validation of as much meta-data as possible as quickly as possible in order to avoid interruptions in normal operation of the data storage system.