This invention relates generally to a data storage system and a method of storing data and, more particularly, to a system and method implementing a log structured array in a storage subsystem with at least two storage controller processors controlling a shared set of direct access storage devices.
A data storage subsystem having multiple direct access storage devices (DASDs) may store data and other information in an arrangement called a log structured array (LSA).
Log structured arrays combine the approach of the log structured file system architecture as described in xe2x80x9cThe Design and Implementation of a Log Structured File Systemxe2x80x9d by M. Rosenblum and J. K. Ousterhout, ACM Transactions on Computer Systems, Vol. 10 No. 1, February 1992, pages 26-52 with a disk array architecture such as the well-known RAID (redundant arrays of inexpensive disks) architecture which has a parity technique to improve reliability and availability. RAID architecture is described in xe2x80x9cA Case for Redundant Arrays of Inexpensive Disks (RAID)xe2x80x9d, Report No. UCBICSD 87/391, December 1987, Computer Sciences Division, University of California, Berkeley, Calif. xe2x80x9cA Performance Comparison of RAID 5 and Log Structured Arraysxe2x80x9d, Proceedings of the Fourth IEEE International Symposium on High Performance Distributed Computing, 1995, pages 167-178 gives a comparison between LSA and RAID 5 architectures.
An LSA stores data to an array of DASDs in a sequential structure called a log. New information is not updated in place, instead it is written to a new location to reduce seek activity. The data is written in strides or stripes distributed across the array and there may be a form of check data to provide reliability of the data. For example, the check data may be in the form of a parity check as used in the RAID 5 architecture which is rotated across the strides in the array.
An LSA generally consists of a controller and N+M physical DASDs. The storage space of N DASDs is available for storage of data. The storage space of the M DASDs is available for the check data. M could be equal to zero in which case there would not be any check data. If M=1 the system would be a RAID 5 system in which an exclusive-OR parity is rotated through all the DASDs. If M=2 the system would be a known RAID 6 arrangement.
The LSA controller manages the data storage and writes updated data into new DASD locations rather than writing new data in place. The LSA controller keeps an LSA directory which it uses to locate data items in the array.
As an illustration of the N+M physical DASDs, an LSA can be considered as consisting of a group of DASDs. Each DASD is divided into large consecutive areas called segment-columns. If the DASDs are in the form of disks, a segment-column is typically as large as a physical cylinder on the disk. Corresponding segment-columns from the N+M devices constitute a segment. The array has as many segments as there are segment-columns on a single DASD in the array. One or more of the segment-columns of a segment may contain the check data or parity of the remaining segment-columns of the segment. For performance reasons, the check data or parity segment-columns are not usually all on the same DASD, but are rotated among the DASDs.
Logical devices are mapped and stored in the LSA. A logical track is a set of data records to be stored. The data may be compressed or may be in an uncompressed form. Many logical tracks can be stored in the same segment. The location of a logical track in an LSA changes over time. The LSA directory indicates the current location of each logical track. The LSA directory is usually maintained in paged virtual memory.
Whether an LSA stores information according to a variable length format such as a count-key-data (CKD) architecture or according to fixed block architecture, the LSA storage format of segments is mapped onto the physical storage space in the DASDs so that a logical track of the LSA is stored within a single segment.
Reading and writing into an LSA occurs under management of the LSA controller. An LSA controller can include resident microcode that emulates logical devices such as CKD or fixed block DASDs. In this way, the physical nature of the external storage subsystem can be transparent to the operating system and to the applications executing on the computer processor accessing the LSA. Thus, read and write commands sent by the computer processor to the external information storage system would be interpreted by the LSA controller and mapped to the appropriate DASD storage locations in a manner not known to the computer processor. This comprises a mapping of the LSA logical devices onto the actual DASDs of the LSA.
In an LSA, updated data is written into new logical block locations instead of being written in place. Large amounts of updated data are collected as tracks in controller memory and destaged together to a contiguous area of DASD address space called a segment. A segment is usually an integral number of stripes of a parity system such as RAID 5. As data is rewritten into new segments, the old location of the data in previously written segments becomes unreferenced. This unreferenced data is sometimes known as xe2x80x9cgarbagexe2x80x9d. If this were allowed to continue without taking any action, the entire address space would eventually be filled with segments which would contain a mixture of valid (referenced) data and garbage. At this point it would be impossible to destage any more data into the LSA because no free log segments would exist into which to destage data.
To avoid this problem, a process known as xe2x80x9cFree Space Collectionxe2x80x9d (FSC) or xe2x80x9cGarbage Collectionxe2x80x9d must operate upon the old segments. FSC collects together the valid data from partially used segments to produce completely used segments and completely free segments. The completely free segments can then be used to destage new data. In order to perform free space collection, data structures must be maintained which count the number of garbage and referenced tracks in each segment and potentially also statistics which indicate the relative rate of garbage accumulation in a segment. (See xe2x80x9cAn Age Threshold Scheme for Garbage Collection in a Log Structured Arrayxe2x80x9d Jai Menon, Larry J Stockmeyer. IBM Research Journal 10120.)
Snapshot copy is a facility that is commonly supported by LSA subsystems. Snapshot copy describes a system by which the LSA directory is manipulated so as to map multiple areas of the logical address space onto the same set of physical data on DASDs. This operation is performed as an xe2x80x9catomic eventxe2x80x9d in the subsystem by means of locking. Either copy of the data can subsequently be written to without affecting the other copy of the data (a facility known as copy on write).
Snapshot copy has several benefits to the customer: (1) It allows the capture of a consistent image of a data set at a point in time. This is useful in many ways including backup and application testing and restart of failing batch runs. (2) It allows multiple copies of the same data to be made and individually modified without allocating storage for the set of data which is common between the copies.
In existing storage subsystems, a redundant storage subsystem is often constructed from a pair of storage controller processors which share a common pool of DASDs to which they are both connected and the pair of controllers support a same set of logical upstream devices. Each storage controller processor typically comprises the following components. (a) An upstream communication channel to the host computer(s). (b) A non-volatile memory into which data written from the host computer may be stored between the time that completion status for the write is given to the host computer and the time that the data is committed to a DASD for long term storage. (c) Some stored programs which operate upon host data so as to transform and or replicate it in some way. Examples are RAID modules, LSA modules, compression modules. (d) Connections to a pool of DASDs used for the long term storage of data.
The storage controllers communicate with each other via some means to: (1) co-ordinate the management of the DASDs and any RAID arrays build upon them; (2) replicate xe2x80x9cFast Write Cachexe2x80x9d data; and (3) co-ordinate accesses from multiple hosts so that operations are applied to the stored data in the correct order to maintain the integrity of the data.
In this way, the controllers can share the workload from the host computers and co-operate with each other in order to service that workload.
In the event of a failure of either controller, or a breakdown in communication either between a controller and the host or between a controller and the DASDs, the remaining controller will take over the entire workload, resulting in no loss of availability of data to the host computers.
A log structured array within a redundant storage subsystem such as the one described above presents some special factors not faced by non-LSA subsystems.
It is a trivial matter to show that the most advantageous arrangement for an LSA subsystem is for all of the DASDs attached to the controllers to be managed as a single LSA. This single massive LSA may be partitioned into individual smaller xe2x80x9cpartitionsxe2x80x9d. These partitions have meaning to the host computers which may use them to partition ownership of data sets between the host computers or to group logically related data.
This single LSA arrangement eliminates skew by flattening the I/O load across all of the DASDs. This results in more concurrent transactions per second and a greater sustained bandwidth than could otherwise be obtained for accesses to a single volume. Also, the single LSA arrangement allows the free space in the LSA to be shared by all volumes.
The single LSA approach also allows for snapshot copy between any arbitrary part of any volume and any other volume. This would not be possible if the DASDs were divided into separate LSAs as snapshot copy between different LSA directories is not possible.
Maintaining a single LSA across all the DASDs connected to the controller pair has the disadvantage that the controllers must co-operate with one another in order to reference and update certain data structures. For example, they must reference and update the LSA directory, the data structure which holds free segments, the segment usage counters and any data structures maintained to allow efficient free space collection
It will be obvious to those skilled in the art that the co-ordination of these complex interrelated data structures in what is essentially a loosely coupled multiprocessing (LCMP) system involves both significant complexity and also significant locking, which introduces overhead into the I/O path and thus reduces system throughput and increases service time.
An aim of the invention is to provide an LSA in a storage subsystem comprising two or more storage controllers operating together in a redundant xe2x80x9cno single point of failurexe2x80x9d configuration controlling a shared set of DASDs.
According to a first aspect of the present invention there is provided a data storage system comprising at least two controllers and a storage device with data storage space which is shared by the controllers, wherein the controllers share the workload by dividing the shared storage space into n sets of stripes where the space in each set of stripes is designated to one controller and the stripes are sufficiently small to divide the workload uniformly across the storage device. In a preferred case, n is equal to the number of controllers.
Each controller manages the data in its designated stripes. Preferably, the units of the stripes are sufficiently small so that each portion of a host workload spans multiple stripes.
In the case of two controllers, the shared storage space may be divided into stripes of odd and even tracks, all odd tracks being processed by one controller and all even tracks being processed by the other controller.
The data storage system optimally includes a processor and memory, and the data storage device is an array of storage devices having a plurality of data blocks organized on the storage devices in segments distributed across the storage devices, wherein when a data block in a segment stored on the storage devices in a first location is updated, the updated data block is assigned to a different segment, written to a new storage location, and designated as a current data block, and the data block in the first location is designated as an old data block, and having a main directory, stored in memory, containing the locations of the storage devices of the current data blocks.
Optimally, the data storage system is a log structured array and the storage device is a plurality of direct access storage devices. The log structured array may use check data in a storage device formed of an array of direct access storage devices.
Preferably, write operations are mirrored to the other, or at least one other, controller for redundancy. Each controller may have a primary cache for the data from stripes designated to that controller and a secondary cache for data from stripes designated to another controller.
If one controller fails then another controller can take over the entire workload keeping the data structures separate so that the workload can be moved back when the failing controller has been repaired.
Each controller may have a directory providing location information for data in stripes designated to that controller. Free space collection may be carried out separately by each controller for data in stripes designated to that controller.
There is no contention between the controllers for access to the storage, the directories or the meta-data and no locking is required.
According to a second aspect of the present invention, there is provided a method of storing data in a system in which at least two controllers share storage space comprising dividing the shared storage space into n sets of stripes where the space of each stripe is designated to one controller, wherein the stripes are sufficiently small to divide the workload uniformly across the storage space.
The problem addressed by the present invention is to use two or more controllers to provide access to the same storage devices without large locking overheads. The invention achieves this and also avoids read cache duplication and divides the workload evenly between the controllers.