This invention relates generally to a method for rebuilding meta-data in a data storage system and a data storage system, more particularly, to a method and system for validating segments during meta-data rebuild of a log structured array.
A data storage subsystem having multiple direct access storage devices (DASDs) may store data and other information in an arrangement called a log structured array (LSA).
Log structured arrays combine the approach of the log structured file system architecture as described in xe2x80x9cThe Design and Implementation of a Log Structured File Systemxe2x80x9d by M. Rosenblum and J. K. Ousterhout, ACM Transactions on Computer Systems, Vol. 10 No. 1, February 1992, pages 26-52 with a disk array architecture such as the well-known RAID (redundant arrays of inexpensive disks) architecture which has a parity technique to improve reliability and availability. RAID architecture is described in xe2x80x9cA Case for Redundant Arrays of Inexpensive Disks (RAID)xe2x80x9d, Report No. UCBICSD 87/391, December 1987, Computer Sciences Division, University of California, Berkeley, Calif. xe2x80x9cA Performance Comparison of RAID 5 and Log Structured Arraysxe2x80x9d, Proceedings of the Fourth IEEE International Symposium on High Performance Distributed Computing, 1995, pages 167-178 gives a comparison between LSA and RAID 5 architectures.
An LSA stores data to an array of DASDs in a sequential structure called a log. New information is not updated in place, instead it is written to a new location to reduce seek activity. The data is written in strides or stripes distributed across the array and there may be a form of check data to provide reliability of the data. For example, the check data may be in the form of a parity check as used in the RAID 5 architecture which is rotated across the strides in the array.
An LSA generally consists of a controller and N+M physical DASDs. The storage space of N DASDs is available for storage of data. The storage space of the M DASDs is available for the check data. M could be equal to zero in which case there would not be any check data. If M=1 the system would be a RAID 5 system in which an exclusive-OR parity is rotated through all the DASDs. If M=2 the system would be a known RAID 6 arrangement.
The LSA controller manages the data storage and writes updated data into new DASD locations rather than writing new data in place. The LSA controller keeps an LSA directory which it uses to locate data items in the array.
As an illustration of the N+M physical DASDs, an LSA can be considered as consisting of a group of DASDs. Each DASD is divided into large consecutive areas called segment-columns. If the DASDs are in the form of disks, a segment-column is typically as large as a physical cylinder on the disk. Corresponding segment-columns from the N+M devices constitute a segment. The array has as many segments as there are segment-columns on a single DASD in the array. One or more of the segment-columns of a segment may contain the check data or parity of the remaining segment-columns of the segment. For performance reasons, the check data or parity segment-columns are not usually all on the same DASD, but are rotated among the DASDs.
Logical devices are mapped and stored in the LSA. A logical track, sometimes referred to as a data block, is a set of data records to be stored. The data may be compressed or may be in an uncompressed form. Many logical tracks can be stored in the same segment. The location of a logical track in an LSA changes over time. The LSA directory indicates the current location of each logical track. The LSA directory is usually maintained in paged virtual memory.
Whether an LSA stores information according to a variable length format such as a count-key-data (CKD) architecture or according to fixed block architecture, the LSA storage format of segments is mapped onto the physical storage space in the DASDs so that a logical track of the LSA is stored within a single segment.
Reading and writing into an LSA occurs under management of the LSA controller. An LSA controller can include resident microcode that emulates logical devices such as CKD or fixed block DASDs. In this way, the physical nature of the external storage subsystem can be transparent to the operating system and to the applications executing on the computer processor accessing the LSA. Thus, read and write commands sent by the computer processor to the external information storage system would be interpreted by the LSA controller and mapped to the appropriate DASD storage locations in a manner not known to the computer processor. This comprises a mapping of the LSA logical devices onto the actual DASDs of the LSA.
In an LSA, updated data is written into new logical block locations instead of being written in place. Large amounts of updated data are collected as tracks in controller memory and destaged together to a contiguous area of DASD address space called a segment. A segment is usually an integral number of stripes of a parity system such as RAID 5. As data is rewritten into new segments, the old location of the data in previously written segments becomes unreferenced. This unreferenced data is sometimes known as xe2x80x9cgarbagexe2x80x9d. If this were allowed to continue without taking any action, the entire address space would eventually be filled with segments which would contain a mixture of valid (referenced) data and garbage. At this point it would be impossible to destage any more data into the LSA because no free log segments would exist into which to destage data.
To avoid this problem, a process known as xe2x80x9cFree Space Collectionxe2x80x9d (FSC) or xe2x80x9cGarbage Collectionxe2x80x9d must operate upon the old segments. FSC collects together the valid data from partially used segments to produce completely used segments and completely free segments. The completely free segments can then be used to destage new data. In order to perform free space collection, data structures must be maintained which count the number of garbage and referenced tracks in each segment and potentially also statistics which indicate the relative rate of garbage accumulation in a segment. (See xe2x80x9cAn Age Threshold Scheme for Garbage Collection in a Log Structured Arrayxe2x80x9d Jai Menon, Larry J Stockmeyer. IBM Research Journal 10120.)
Log structured arrays are direct access storage devices which contain meta-data, which maps a virtual storage space onto physical storage resources. The meta-data maps extents of the virtual storage space (virtual tracks) onto real tracks collated into segments stored on the physical resources. The term meta-data includes, but is not limited to, data which describes or relates to other data. For example: data held in the LSA directory regarding the addresses of logical tracks in the physical storage space; data regarding the fullness of segments with valid (live) tracks; a list of free or empty segments; data for use in free space collection algorithms; the configuration of logical volumes; topology data.
Corruption of the meta-data renders the data stored in the virtual storage space inaccessible as without the meta-data it is impossible to determine which real track corresponds to a particular virtual track. It is therefore desirable to be able to rebuild the meta-data in the event that the meta-data is corrupted in order to restore access to the data.
A secondary source of the meta-data is needed to rebuild the meta-data to allow access to the stored data. A known method of providing a secondary source of the meta-data is to provide a segment directory containing information on the data blocks or tracks in the segment and storing the segment directory as part of the segment on the storage devices. The segment directory contains all the meta-data required to describe the blocks or tracks that have been closed in that segment together with some ordering information.
U.S. Pat. No. 6,052,799 describes a method for recovering the directory of a log structured array with a secondary source of meta-data being stored in segment directories. The method includes periodically writing a checkpoint of the directory to the storage devices. A list is also maintained of closed segments written to the storage devices since the checkpoint directory was written to the storage devices. During a directory recovery procedure, the checkpoint of the directory is read into memory, and for each segment that is indicated by the closed segments list as having been closed since the checkpoint of the main directory was written to the disk, the information in the corresponding segment directory and the ordering information therein is used to update the checkpoint directory.
A segment of data in a log structured array or a RAID is written across multiple storage devices in an array. Therefore, segments are often not written as an atomic event. Also, the storage devices do not always write data in one piece but may break it up and reorder it. If normal operation of an array is interrupted, a segment which is in the process of being written may be only partially written and may contain inconsistent information. For example, the segment directory may have been written and some but not all of the of the segment data, the segment directory of that segment cannot be used as it does not correctly reflect the contents of the segment.
The problem is to provide a meta-data rebuild process with a way of determining which segments can be trusted and which must be ignored as they may have only been partially written when a meta-data rebuild process is invoked after normal operation of an array was interrupted at any point. Only segments that were completely written can be trusted to contain consistent information.
Known solutions to the problem of not considering partial segments during the rebuild of a log structured array include making a journal of segment closes in progress in a fast non-volatile memory. This has the following problems: the non-volatile memory is resident in the controller and not the physical storage resource so meta-data rebuild is tied to the specific controller containing the non-volatile memory. This specific controller might also have failed. This memory might be transferable to another controller which could help. However, the meta-data rebuild might be required because the controller and hence the non-volatile memory was actually destroyed. The memory might be mirrored to a remote controller but this has performance problems and dual-controller configurations would hopefully not require meta-data rebuilds unless both controllers had been destroyed or the non-volatile data in both had been destroyed, for example by a flat battery, in which case both copies of the journal might be lost.
Another known method of trying to use only fully completed segments in a rebuild process is to mark the first and last sectors with a unique sequence number or signature. All segments with the same number in the first and last sector should then be valid. This method only works if the whole segment is written sequentially and is not broken up into several parts that are written in parallel. If the segment is split into multiple I/Os, the signature is no longer a guarantee that the entire segment has been written.
The aim of the present invention is to provide a method for rebuilding meta-data in a data storage system which only uses meta-data from valid fully written segments.
According to a first aspect of the present invention there is provided a method for rebuilding meta-data in a storage system having storage devices in which segments of data are located, wherein data is written in segments to the storage devices from a plurality of flows of data and each segment of data also contains meta-data relating to that segment; the method comprising: scanning the meta-data in each segment to identify the last segment written from each flow; rebuilding the meta-data in the storage system using the meta-data in the segments excluding the meta-data for the segments identified as being the last segments written from each flow.
Preferably, the data storage system includes a processor and memory, and the data storage devices are an array of storage devices having a plurality of data blocks organized on the storage devices in segments distributed across the storage devices, wherein when a data block in a segment stored on the storage devices in a first location is updated, the updated data block is assigned to a different segment, written to a new storage location and designated as a current data block, and the data block in the first location is designated as an old data block, and having a main directory, stored in memory, containing the locations on the storage devices of the current data blocks.
The data storage system may be a log structured array and the storage devices may be a plurality of direct access storage devices.
Optimally, the method includes scanning the meta-data of each segment to identify any segments which do not contain any current data blocks and wherein the method also comprises excluding any such segments from the rebuilding process.
The meta-data in the segments may include a description of data blocks stored in the segment and ordering information. Preferably, the ordering information includes an identifier of the flow from which the segment was written and a sequence number relating to the order the segments are written from one flow.
Preferably, the description of data blocks in the segment and the ordering information is written atomically in the segment. The description of the data blocks in a segment and the ordering information may be written in a single sector in a segment. Alternatively, the description of the data blocks in a segment and the ordering information may be written in more than one sector in a segment and wherein the writes of the sectors in a segment are atomicised by including the segment sequence number in each sector. Preferably, a segment with a plurality of sectors containing meta-data is ignored in the rebuilding process if the sectors have different segment sequence numbers.
Preferably, the last segment written from each flow is excluded from a free space collection process.
Preferably, writing a segment from a flow commits the previous segment written from that flow. A flow may be flushed by writing an empty segment in order to commit the previous segment from that flow.
According to a second aspect of the present invention there is provided a data storage system having storage devices in which segments of data are located, including a plurality of flows provided in the data storage system from which data is written in segments to the storage devices, each segment of data also containing meta-data relating to that segment; wherein meta-data in the storage system can be rebuilt using the meta-data in the segments excluding the meta-data for the segments identified as being the last segments written from each flow.
Preferably, the data storage system includes a processor and memory, and the data storage devices are an array of storage devices having a plurality of data blocks organized on the storage devices in segments distributed across the storage devices, wherein when a data block in a segment stored on the storage devices in a first location is updated, the updated data block is assigned to a different segment, written to a new storage location and designated as a current data block, and the data block in the first location is designated as an old data block, and having a main directory, stored in memory, containing the locations on the storage devices of the current data blocks.
The data storage system may be a log structured array and the storage devices may be a plurality of direct access storage devices. The log structured array may use check data in a storage device formed of an array of direct access storage devices.
According to a third aspect of the present invention there is provided a computer program product stored on a computer readable storage medium, comprising computer readable program code means for performing the steps of: scanning meta-data in each segment in storage devices to identify the last segment written from each of a plurality of flows of data; rebuilding the meta-data in the storage devices using the meta-data in the segments excluding the meta-data for the segments identified as being the last segments written from each flow.
Segments are written in a number of concurrent flows of segment writes. Each flow has increasing segment sequence numbers and each flow has an identifier. A segment write is not started in a flow until the previous segment write in that flow has completed. The meta-data rebuild process performs a pass which examines every segment sequence number/flow identifier to find the last segment written in every flow. These segments are potentially invalid and must be ignored. The meta-data rebuild process performs a second pass to determine which segments are void of valid tracks (i.e. all the tracks in these segments appear in other segments with more recent segment sequence numbers). These segments are also potentially invalid and must be ignored. The remaining segments are guaranteed to be valid.
The advantages of the method and system of the present invention are as follows:
No extra DASD operations are required.
Minimal CPU overhead is required.
No extra hardware is required.
The meta-data rebuild process requires only the information on the physical storage resources (and does not, for instance, require access to a separate non-volatile journal).