1. Field of the Invention
The present invention relates to data storage systems that employ base storage along with a high speed cache. More particularly, the invention concerns a data storage system that assigns tokens to data objects stored in cache or base storage. For each data object, a token database tracks xe2x80x9ccascadingxe2x80x9d tokens that include an xe2x80x9canywherexe2x80x9d token and xe2x80x9cbasexe2x80x9d token. The data storage system uses these cascading tokens to track functions such as grooming the cache, de-staging data from cache to base storage, and processing cache miss events.
2. Description of the Related Art
Many data processing systems require a large amount of data storage, for use in efficiently accessing, modifying, and re-storing data. Data storage is typically separated into several different levels, each level exhibiting a different data access time or data storage cost. A first, or highest level of data storage involves electronic memory, usually dynamic or static random access memory (DRAM or SRAM). Electronic-memories take the form of semiconductor integrated circuits where millions of bytes of data can be stored on each circuit, with access to such bytes of data measured in nanoseconds. The electronic memory provides the fastest access to data since access is entirely electronic.
A second level of data storage usually involves direct access storage devices (DASD). DASD storage, for example, includes magnetic and/or optical disks. Data bits are stored as micrometer-sized magnetically or optically altered spots on a disk surface, representing the xe2x80x9conesxe2x80x9d and xe2x80x9czerosxe2x80x9d that comprise the binary value of the data bits. Magnetic DASD includes one or more disks that are coated with remnant magnetic material. The disks are rotatably mounted within a protected environment. Each disk is divided into many concentric tracks, or closely spaced circles. The data is stored serially, bit by bit, along each track. An access mechanism, known as a head disk assembly (HDA) typically includes one or more read/write heads, and is provided in each DASD for moving across the tracks to transfer the data to and from the surface of the disks as the disks are rotated past the read/write heads. DASDs can store gigabytes of data, and the access to such data is typically measured in milliseconds (orders of magnitudes slower than electronic memory). Access to data stored on DASD is slower than electronic memory due to the need to physically position the disk and HDA to the desired data storage location.
A third or lower level of data storage includes tapes, tape libraries, and optical disk libraries. Access to library data is much slower than electronic or DASD storage because a robot or human is necessary to select and load the needed data storage medium. An advantage of these storage systems is the reduced cost for very large data storage capabilities, on the order of Terabytes of data. Tape storage is often used for backup purposes. That is, data stored at the higher levels of data storage hierarchy is reproduced for safe keeping on magnetic tape. Access to data stored on tape and/or in a library is presently on the order of seconds.
Data storage, then, can be conducted using different types of storage, where each type exhibits a different data access time or data storage cost. Rather than using one storage type to the exclusion of others, many data storage systems include several different types of storage together, and enjoy the diverse benefits of the various storage types. For example, one popular arrangement employs an inexpensive medium such as tape to store the bulk of data, while using a fast-access storage such as DASD to cache the most frequently or recently used data.
During normal operations, synchronization between cache and tape is not all that important. If a data object is used frequently, it is stored in cache and that copy is used exclusively to satisfy host read requests, regardless of whether the data also resides in tape. Synchronization can be problematic, however, if the cache and tape copies of a data object diverge over time and the data storage system suffers a disaster. In this case, the cache and tape contain different versions of the data object, with one version being current and the other being outdated. But, which is which? In some cases, there may be some confusion as to which version of the data object is current. At worst, a stale or xe2x80x9cdown-levelxe2x80x9d version of a data object may be mistaken (and subsequently used) as the current version. Thus, in the event of cache failure, data integrity may be questionable and there is some risk of the data storage system incorrectly executing future host read requests by recalling a stale version of the data.
Broadly, the present invention concerns a data storage system that employs base storage along with a high speed cache. Whenever a data object is stored in the cache or base storage, it is assigned (and optionally encapsulated with) an anywhere token. The anywhere token contains a code indicating the data object""s version. Whenever the data object is stored in base storage, the data object is assigned a base token with the same value as its current anywhere token. Thus, the base token also contains the data object""s latest version code at the time the data object is written in base storage. However, the base token is frozen in time because future cache-only updates of the data object will have the effect of changing the anywhere token without affecting the base token. The anywhere/base tokens of each data object constitute cascading tokens. These cascading tokens are available for use by the data storage system to track functions such as grooming the cache, de-staging data to base storage, and processing cache miss events. All tokens are stored in a token database.
In more specfiic terms, the data storage system of this invention includes a controller, cache, base storage, and various organizational data such as a token database, cache directory, base-storage-written list, etc. For each data object, the token database is capable of listing an anywhere token and""a base token. When a data object is received for storage, the controller assigns an anywhere token to the data object. The anywhere token contains the latest metadata for the data object, including at least a version code. Optionally, the controller may encapsulate the data object with the version code and-some or all of the remaining metadata of the data object""s anywhere token. The controller proceeds to store the data object in the cache, base storage, or both. The controller also stores the anywhere token in the token database, cross-referenced against the data object. Whenever the data object is written to base storage, the controller updates the token database by copying the anywhere token into the base token field for that data object. Contents of the token database are written out to base storage in pieces of suitable size, such as tokens of individual data objects, parts of the token database, or the entire token database as a whole.
If the storage system experiences a cache failure, normal storage operations are halted until the cache is repaired. Data objects lost from cache can be copied back into cache from base storage. Then, the controller implements a replacement token database. Namely, the controller accesses the token database excerpts in base storage to retrieve the base tokens of all data objects that were lost from cache but still exist in base storage. With these base tokens, the controller populates a replacement token database; namely, these base tokens are used as both anywhere and base tokens for the data objects lost from cache. Then, the replacement token database is used to the exclusion of the previous token database. This avoids any danger of unknowingly recalling down-level data objects from base storage, where newer counterpart data objects had been stored in cache but lost in the cache failure. Also, the cache may be repopulated with the lost data objects in one setting, or as needed in response to future cache misses.
The controller also oversees de-staging and grooming of the cache. According to a prescribed schedule, the controller repeatedly evaluates data objects stored in the cache to identify data objects suitable for storage in base storage. For each identified data object, the controller writes the identified data object to base storage, and copies the anywhere token to the base token in the token database. Under this or another schedule, the controller may also rid the cache of data objects written to base storage.
In one embodiment, the invention may be implemented to provide a method to utilize cascading tokens to manage a cache-equipped data storage system. In another embodiment, the invention may be implemented to provide an apparatus, such as a controller or entire data storage system, employing cascading tokens to manage cache-equipped data storage. In still another embodiment, the invention may be implemented to provide a signal-bearing medium tangibly embodying a program of machine-readable instructions executable by a digital data processing apparatus to perform operations for utilizing cascading tokens to manage a cache-equipped data storage system. Another embodiment concerns logic circuitry having multiple interconnected electrically conductive elements configured to perform operations as discussed above.
The invention affords its users with a number of distinct advantages. For example, the invention encourages data integrity because it keeps track of token levels in the cache and in base storage. Additionally, the invention aids in more reliable disaster recovery by using a backup copy of data stored in base storage. Recovery is also more reliable because the levels of tokens in base storage are known. The invention also provides a number of other advantages and benefits, which should be apparent from the following description of the invention.