1. Field of the Invention
The present invention relates to data storage systems that utilize tape or other base storage along with high speed cache. More particularly, the invention concerns a data storage system that stores data objects with encapsulated metadata tokens in cache and/or base storage to protect against recalling stale data from base storage in the event of a cache failure.
2. Description of the Related Art
Many data processing systems require a large amount of data storage, for use in efficiently accessing, modifying, and re-storing data. Data storage is typically separated into several different levels, each level exhibiting a different data access time or data storage cost. A first, or highest level of data storage involves electronic memory, usually dynamic or static random access memory (DRAM or SRAM). Electronic memories take the form of semiconductor integrated circuits where millions of bytes of data can be stored on each circuit, with access to such bytes of data measured in nanoseconds. The electronic memory provides the fastest access to data since access is entirely electronic.
A second level of data storage usually involves direct access storage devices (DASD). DASD storage, for example, includes magnetic and/or optical disks. Data bits are stored as micrometer-sized magnetically or optically altered spots on a disk surface, representing the xe2x80x9conesxe2x80x9d and xe2x80x9czerosxe2x80x9d that comprise the binary value of the data bits. Magnetic DASD includes one or more disks that are coated with remnant magnetic material. The disks are rotatably mounted within a protected environment. Each disk is divided into many concentric tracks, or closely spaced circles. The data is stored serially, bit by bit, along each track. An access mechanism, known as a head disk assembly (HDA) typically includes one or more read/write heads, and is provided in each DASD for moving across the tracks to transfer the data to and from the surface of the disks as the disks are rotated past the read/write heads. DASDs can store gigabytes of data, and the access to such data is typically measured in milliseconds (orders of magnitudes slower than electronic memory). Access to data stored on DASD is slower than electronic memory due to the need to physically position the disk and HDA to the desired data storage location.
A third or lower level of data storage includes tapes, tape libraries, and optical disk libraries. Access to library data is much slower than electronic or DASD storage because a robot or human is necessary to select and load the needed data storage medium. An advantage of these storage systems is the reduced cost for very large data storage capabilities, on the order of Terabytes of data. Tape storage is often used for backup purposes. That is, data stored at the higher levels of data storage hierarchy is reproduced for safe keeping on magnetic tape. Access to data stored on tape and/or in a library is presently on the order of seconds.
Data storage, then, can be conducted using different types of storage, where each type exhibits a different data access time or data storage cost. Rather than using one storage type to the exclusion of others, many data storage systems include several different types of storage together, and enjoy the diverse benefits of the various storage types. For example, one popular arrangement employs an inexpensive medium such as tape to store the bulk of data, while using a fast-access storage such as DASD to cache the most frequently or recently used data.
During normal operations, synchronization between cache and tape is not all that important. If a data object is used frequently, it is stored in cache and that copy is used exclusively to satisfy host read requests, regardless of whether the data also resides in tape. Synchronization can be problematic, however, if the cache and tape copies of a data object diverge over time and the data storage system suffers a disaster. In this case, the cache and tape contain different versions of the data object, with one version being current and the other being outdated. But, which is which? In some cases, there may be some confusion as to which version of the data object is current. At worst, a stale or xe2x80x9cdown-levelxe2x80x9d version of a data object may be mistaken (and subsequently used) as the current version. Thus, in the event of cache failure, data integrity may be questionable and there is some risk of the data storage system incorrectly executing future host read requests by recalling a stale version of the data.
Broadly, the present invention concerns a cache-equipped data storage system that stores data objects with encapsulated metadata tokens to protect against recalling stale data from base storage in the event of a cache failure. The storage system includes a controller coupled to a cache, base storage, and token database. The controller may be coupled to a hierarchically superior director or host.
When a data object is received for storage, the controller assigns a version code for the data object if the data object is new to the system; if the data object already exists, the controller advances the data object""s version code. A xe2x80x9ctoken,xe2x80x9d made up of various items of metadata including the version code, is encapsulated for storage with its corresponding data object. The controller then stores the encapsulated token along with its data object and updates the token database to cross-reference the data object with its token. Thus, the token database always lists the most recent version code for each data object in the system.
The data object may be copied from cache to base storage automatically, de-staged from cache to base storage based on lack of frequent or recent use, or according to another desired schedule. Whenever the controller experiences a cache miss, there is danger in blindly retrieving the data object from base storage. In particular, the cache miss may have occurred due to failure of part or all of the cache, and at the time of cache failure the base storage might have contained a down-level version of the data object. The present invention solves this problem by comparing the version code of the data object from base storage to the version code of the data object in the token database. Only if the compared version codes match is the data object read from storage and provided as output. Otherwise, an error message is generated since the data object is stale.
As a further enhancement, the invention may utilize a xe2x80x9csplitxe2x80x9d version code, where the version code has a data subpart and properties subpart. The data subpart is advanced solely to track changes to the data, while the properties subpart is advanced according to changes in attributes of the data object other than the data itself. In this embodiment, when the data object""s version code from base storage is examined after a cache miss, the data subpart is reviewed without regard to the properties subpart. This avoids the situation where, although the base storage contains a current version of data, this data object would be regarded as stale because a non-split version code that does not make any data/properties differentiation has been advanced due to a change in the data object""s properties not affecting the data itself. Accordingly, with this feature, data objects from base storage are more frequently available to satisfy cache misses.
Accordingly, as discussed above, one embodiment of the invention involves a method of operating a cache-equipped data storage system. In another embodiment, the invention may be implemented to provide an apparatus, such as a data storage system configured as discussed herein. In still another embodiment, the invention may be implemented to provide a signal-bearing medium tangibly embodying a program of machine-readable instructions executable by a digital data processing apparatus to perform operations for operating a data storage system. Another embodiment concerns logic circuitry having multiple interconnected electrically conductive elements configured to perform operations as discussed above.
The invention affords its users with a number of distinct advantages. For example, in the event of a cache miss resulting from unintentional loss of the cached data, the invention avoids unknowingly recalling a down-level data object from base storage. Thus, the invention helps ensure data integrity. Furthermore, in the event of a cache miss, the invention increases data availability by using xe2x80x9csplitxe2x80x9d version codes. Despite any changes to the data""s properties that still leave the data intact, the data object is still available for retrieval if the data subpart of its version code is still current according to the token database. The invention also provides a number of other advantages and benefits, which should be apparent from the following description of the invention.