The present invention relates to the managing of data sets stored within storage subsystems. More particularly, the invention relates to a method and apparatus for managing update information of data sets stored in a storage subsystem.
A Storage Subsystem (SS) typically consists of a plurality of Direct Access Storage Devices (DASD), a fast Random Access Memory (RAM) device, also known as xe2x80x9cCache memoryxe2x80x9d, and a Non-Volatile RAM (NVRAM). A magnetic storage media (e.g., hard disks, tapes, etc.) usually implements the DASDs, and this is where the data is eventually stored. The Cache memory device is utilized to enable fast I/O interactions, with Hosts and/or other devices, to take place. Therefore, it is usually implemented from fast RAM devices (e.g., SRAM), which are volatile. The NVRAM is usually implemented by a battery backed-up RAM, or by types of flash memories, and their functioning and management is of critical importance for the SS operation, as will be explained hereinafter.
Hard Disk Drives (HDD) are commonly utilized as the main storage device for DASD implementations. HDDs are relatively cheap and non-volatile storage devices, which have a substantially large capacity. These devices are usually comprised of circular magnetic mediums (disks) and read/write magnetic heads. To enable the magnetic heads to efficiently locate data stored on the HDD""s disks, the stored data is organized in Tracks, Sectors and Clusters. Each disk is divided into a number of concentric circles, so-called Tracks. The HDDs disks are also partitioned into xe2x80x9cpie slicesxe2x80x9d, known as Sectors. Each of the disk""s Sectors consists of Clusters, comprising the smallest storage unit of data on HDD""s disk (typically 256 or 512 Bytes in length).
The HDD disks rotate continuously, and in order to reach a specific location on the disk, the magnetic heads are located over the respective disk Track, where they wait for the required Sector and Cluster on the rotating disk. From this type of operation the HDD""s characteristics are derived, these being the latency time, and bandwidth. The latency time is derived from the velocity at which the magnetic heads may be moved from one Track to another. The bandwidth is derived from the circular velocity of the HDD""s disks, and actually indicates the read/write rate (Bytes per second), once the magnetic heads are properly located.
Another important factor, which influences the operation of HDDs, is that data may be read/write only at the Cluster level. This means that in order to read/write a single Byte, the operation is performed on the entire Cluster, in which this Byte is located. The limitation imposed by the latency time, Bandwidth and the Cluster read/write operation makes it very attractive to perform HDD transactions in the Track level. The performance is substantially improved when the read/write operation of consecutive disk Clusters is performed.
The limited Bandwidth and the latency time impose low-resolution operation of the HDDs, and therefore the main data stream is directed to/from the Cache device to reduce the access to the DASDs. In general, the Cache memory is utilized as a temporary storage device for incoming and outgoing data. In this way, the data is written to the Cache device at first, so that the DASD""s I/O transactions are actually performed between an I/O device and a fast Cache memory device. This allows an efficient HDD operation, and on the other hand, I/O transactions are performed substantially faster.
However, the Cache memory is volatile, and therefore vulnerable to power-and other failures (failures that may result in the loss of the Cache memory content). Therefore, usually, a copy of the Cache content is stored on the NVRAM, and the data sets on the DASDs are updated on the base of Least Recently Used (LRU) policies. The NVRAMs, on the other hand, are relatively expensive, and therefore they are usually small in terms of memory size, generally too small to hold all the modified data sets required. To solve the foregoing problem, special algorithms (e.g., LRU algorithms) are utilized to enable an efficient Cache management.
This is particularly relevant when dealing with Control Data (CD). These data are utilized to manage and control the SS operations. CD usually has a special structure which enables efficient encoding. For example, bit map images are often utilized to designate changes made to the copies of data sets stored in different locations. In this way, for each data set there is a corresponding bit, in the bit map image, such that when changes are applied to the content of this data set, the state of the bit is altered to designate that the copies are no longer even. Typically, the content of consecutive bits is changed, and changing a single bit is relatively rare. The updates of this type of CD may be easily encoded to a structure consisting of the change (i.e., 0xe2x86x921 or 1←0) and the range of bits that changed their state.
The CD Sets (CDS) are preferably stored on the DASDs. Since changes are frequently being made to fractions of the CDSs, they are copied to the Cache memory, and modified on the DASD in an LRU base. The copy of the CDS on the Cache memory may be updated frequently, but this copy of updated CDS is vulnerable and volatile. Therefore, any change applied to the CDS stored on the Cache memory is efficiently encoded into an update record, which is then stored in the NVRAM. As previously discussed, the size of the NVRAM is relatively small, and therefore it cannot contain all the required update information. This problem is typically resolved by applying the modified information, stored in the Cache memory, to the appropriate CDS stored on the DASD, and thereby, freeing NVRAM memory space and actually updating the original CDSs.
It should be clear that by applying the update information to the CDSs on the DASD, it is meant that the updated CDSs stored on the Cache memory are copied to the appropriate DASD tracks. Thereby the CDSs track on the DASD becomes an up-to-date CDSs copy, and the appropriate updates that are stored on the NVRAM may be removed and reused for storage of further update records.
To enable automatic recovery from a lost or damaged CDS, a journal of CDSs changes is maintained. The journal data set (hereinafter referred to as the xe2x80x9cjournal of changesxe2x80x9d) contains each change (in encoded format) made to the CDSs since the last time the CDSs was successfully copied from the Cache memory to their DASD tracks. In the event of losing the content of the Cache memory, the CDSs are recovered by applying the changes reflected by the update records that are stored in the journal of changes stored (on the NVRAM) with the copy of the CDSs maintained on the DASDs.
The Track Set Manager (TSM) implementation in the IBMs Enterprise Storage Server (ESS), utilizes a similar method, i.e., maintaining a journal of changes. In the ESS""s TSM the CDSs are stored in the Cache memory, where they are subject to rapid changes. The changes that are applied to the CDS stored in the Cache memory are encoded and stored on an NVRAM buffer. The TSM implementation divides the NVRAM partition into two distinct partitions, to enable an efficient management of the update information.
FIG. 1 schematically illustrates the method utilized in the ESS""s TSM. The CDS is maintained in two different locations on disks 100 and 101. A Checkpoint process alternately writes CDS from the Cache memory, to one of the DASD""s copies (illustrated in lines 103 and 104), 100 and 101. All of the updates to a CDS are written to both of the disk Tracks 100 and 101, over a period of two Checkpoint processes. The NVRAM buffer 112 is utilized to hold the encoded records of the updates that are applied to the copy of the CDSs on the Cache memory. As mentioned hereinbefore, the NVRAM buffer, 112, is partitioned into two distinct sections, 113 and 114. This structure enables the concurrent operations of storing update information, and freeing NVRAM space (i.e., writing updated CDSs from the Cache memory to the DASD tracks).
More precisely, this type of operation allows the storage of new update records to the NVRAM, and at the same time enables updating the CDSs copy on the DASD tracks, such that while new update records are being stored on one NVRAM partition, the content of the other partition is emptied. It should be obvious that the NVRAM partition may be cleared only when the corresponding CDSs in the Cache memory (i.e., the modified CDSs) are copied to the DASD tracks. In other words, in order to erase update records (i.e., the encoded changes) the modified CDSs to which update records relate, must be copied to the original tracks on the DASD (i.e., where the CDSs are originally stored). Otherwise, the modifications may be permanently lost in the event of system failures.
For example, in FIG. 1, an NVRAM section 114 is holding the recent CDS updates. When the NVRAM partition 114 is filled, the second partition 113, is utilized to hold further CDS updates while the update records on the full partition 114 are cleared by dumping the CDSs (from the Cache memory), associated with its update records, to their DASD tracks. If an NVRAM partition is filled before the other partition is cleared, the filled NVRAM partition is copied temporarily to a special disk Track 115. This enables reuse of the NVRAM buffer for storing new updates, and reconstructing the modified CDSs utilizing the update records stored in the special Track, 115.
A recovery process 130 is performed to reconstruct the CDSs, in case of faulty erasing of the Cache memory content. When the recovery is performed, the CDSs are reconstructed from a valid version of one of the disk locations, 100 or 101, that are stored on the DASD, along with the update records that are stored on the NVRAM partitions, 113 and/or 114, and the update records stored on the special disk Track 115.
The method of the TSM allows continuous storage of update records to the NVRAM buffer. However, the Checkpoint and the recovery processes are relatively long and cumbersome. When a checkpoint process is completed, a partition of the NVRAM buffer is cleared from all of its update records. This means that all the CDSs on the Cache memory that are associated with update records stored in one partition must be dumped (from Cache memory to DASD tracks) within a Checkpoint period to enable clearing the partition before the other partition is refilled.
However, it is not always possible to meet this requirement, and therefore the special DASD track (115) is utilized to store update records, if an NVRAM partition is filled before a Checkpoint process is concluded. The CDSs reconstruction utilizes three different storage locations, and involves determining which of the disk location is the valid one for recovery. As a result, the update and recovery processes result in a complex and relatively long operation. Moreover, in the method of the TSM, the sequence of CDS tracks updates, applied to the DASD, strongly depends on the order of update records in the NVRAM partitions, and therefore the update of the DASD tracks is, in general, not continuous.
The methods described above have not yet provided satisfactory solutions to the problems of the storage, management, and recovery of CDSs of DASDs in storage subsystems.
It is therefore an object of the present invention to provide a method and apparatus for fast and reliable recovery of the CDSs in a storage subsystem in the event of system failures, and for an efficient management of the storage devices.
It is another object of the present invention to provide a method and apparatus for fast and efficient storage and update of CDS in a storage subsystem, which allows an independent update of CDS tracks on DASDs and an efficient exploitation of the NVRAM memory space.
It is a further object of the present invention to provide a method and apparatus for a fast and efficient CDS update process in a storage subsystem, utilizing a small update information record.
Other objects and advantages of the invention will become apparent as the description proceeds.
The following terms are defined as follows:
HOST: any computer that has full two-way access to other computers on a communication network.
I/O device: a device that transfers data to or from a computerized system.
In one aspect, the present invention is directed to a system for the storage and maintenance of data sets updates in a storage subsystem, comprising one or more direct access storage device(s) that serves as the main storage of the storage subsystem and on which the data sets are originally stored. The system further comprises a Cache memory storage device that enables fast interaction with the storage subsystem, and on which a copy of the data sets is stored. The system also comprises a non-volatile storage device partitioned into a plurality of fixed size non-volatile memory pages, and an update process in which the changes to the data sets are applied to the data set copy stored on the Cache memory device. A journal of the changes that are being made to the data sets stored on the Cache memory device is maintained, utilizing the non-volatile memory pages to store update records reflecting the changes in the data sets. A reconstruction process is utilized to reconstruct the data sets, utilizing the update records stored on the non-volatile memory pages, and the data sets stored on the direct access storage device. The system further comprises a process for freeing arbitrary non-volatile pages from their prior association with data sets that are stored in the Cache memory.
Optionally, the Journal of data set changes comprises dynamically allocating and associating free empty non-volatile pages for the storage of update records of data sets, which are not already associated with any non-volatile pages. The Journal of data set changes may further comprise applying updates to the copy of a data set stored in the Cache memory, and determining whether the data set that has been updated on the Cache memory device is associated with one of the journal""s non-volatile memory pages and if so, determining whether the associated non-volatile memory page is full. If it is determined that the associated non-volatile memory page is full, the original copy of the data set that is stored on the direct access storage device is updated, the content of the associated non-volatile memory page is cleared, and the data set""s update record is stored on the non-volatile memory page. In response to a determination that the associated non-volatile memory page is not full, the data set update information is stored on the non-volatile memory page. If it is determined that the data set is not associated with a non-volatile memory page, then determining whether there is a free non-volatile memory page available and if so, associating the non-volatile memory page with the data set, and storing the update information on the non-volatile memory page. When it is determined that none of the non-volatile memory pages is available, the content of an arbitrary data set is dumped from the Cache memory into the direct access memory device, the non-volatile memory page associated with the arbitrary data set is cleared, and then associated with the data set for storing the update information on it.
The reconstruction process optionally comprises updating the original data sets that are stored on the direct access memory device by applying the update information stored on the corresponding non-volatile pages with which they are associated.
Optionally, the process for freeing arbitrary non-volatile pages from their prior association with data sets that are stored in the Cache memory may further comprise arbitrarily choosing a non-volatile page, dumping the content of the CDS associated with the non-volatile page from the Cache memory into the appropriate DASD Track, and clearing and freeing the non-volatile page from its prior association.
The system may further comprise encoding means utilized to encode the changes applied to each of the data sets to obtain an update record to be stored in the non-volatile page associated with the data set, wherein the update record reflects the changes and according to which the updated data set may be reconstructed utilizing its copy in the direct access storage device.