The present invention relates to log structured arrays for storage subsystems, and more particularly to handling of destage requests during shutdown in a log-structured array.
In storage subsystem, a redundant array of inexpensive disks, RAID, is one solution to I/O (input/output) bottleneck problems. RAID typically increases disk bandwidth through parallelism for accessing data and provides high data availability through redundancy. One problem associated with some levels of RAID is the write penalty; a write operation actually requires two disk reads (of old data and parity) and two disk writes (of updated data and the newly calculated parity). Log Structured Array, (LSA), writes all customer data to disk sequentially in a log-like structure, and enables RAID to support data compression. The amount of compression achieved is dependent on the actual data values. After a piece of data is modified, it may not compress to the same number of bytes and thus will not fit into the space originally allocated to it. This problem is encountered in any storage system that assigns a piece of data to a disk fixed location; LSA avoids this problem, since updated data is written to end of the log structure.
Through LSA, a logical track, LT, which is the typical unit accessed by I/O programs, is allowed to be updated to a different location on disk. Since the physical address of a logical track changes over time, a directory, called LSA directory, is necessary to keep track of the current LT""s physical address on the array. Each directory entry also records the logical track""s current length, as this may vary with compression.
The log structured array consists of N+P+S physical disk drives, where N is the number of HDDs"" (hard disk drives) worth of physical space available for customer data, P is the number of HDDs"" worth of physical space for parity data, and S is the number of HDDs"" worth of physical space for spare drives. Each HDD is divided into large consecutive areas called segment columns. Typically, a segment column is as large as a logical cylinder. Corresponding segment columns from the N+P+S HDDs constitute a segment. The array has as many segments as there are segment columns on a HDD disk in the array. An example of the layout for such a system is shown in FIG. 1. In a RAID-5 configuration, one of the segment columns of a segment contains the parity of the remaining data segment columns of the segment.
Referring to FIG. 1, the storage for the partition 52 is arranged as segments 56, where each segment has N data segment columns 58 and one parity segment column 59. The logical tracks 60 are stored within segment columns. A segment directory 62 contains information on each of the logical tracks in the segment which is used during garbage collection and recovery procedures. The segment directory 62 is stored in a small number of sectors out of a segment""s total disk space. As shown, the entire segment directory resides in one same segment column in each of the segments. Alternatively, the segment directory can be spread among the devices. In a RAID-5 system, parity is distributed among the devices as shown.
A segment column is defined as an arbitrary number of contiguous physical tracks as described above. Typically it is desirable to define a segment column to be the same size as a logical cylinder. The collection of disk recording areas comprising corresponding segment columns from each of the HDDs forms what is called a segment.
LSA segments are categorized as one of the following types: free, which refers to a segment that contains no valid data; open, which refers to a segment that is available to hold LTs being destaged; closed, which refers to a segment containing some valid data, but to which no destaged data can be further assigned; and being garbage collected, GC, which refers to a closed segment that is currently being garbage collected, as discussed hereinbelow. A closed segment consists of xe2x80x98livexe2x80x99 LTs and xe2x80x98holesxe2x80x99. The former are LTs that were assigned to the segment during the segment""s open phase and still reside in the segment. The latter is space vacated by LTs that were assigned to the segment but have subsequently been updated and assigned to different open segments. A closed segment""s occupancy is the sum of the lengths of the segment""s live tracks.
A destage operation provides for the LTs in a logical cylinder to be destaged together from a cache within the storage subsystem to a storage device to enhance the seek affinity of sequential accesses. A logical cylinder is typically called a neighborhood, and a group of logical tracks in a logical cylinder destaged together is called a neighborhood in destage (NID) or neighborhood destage request. Destaging a neighborhood essentially involves the following steps:
1. The neighborhood in destage is assigned to an open segment.
2. An open segment remains available to accept other neighborhoods in destage until it is deemed full enough to close in accordance with a desired algorithm.
3. The data and parity of the segment is written to disk before the segment is considered closed.
4. Each LT in the open segment has an entry in the segment directory that describe the LT""s location in the segment. The segment directory is written on disk, as part of the segment.
An LT in a closed segment may be updated and destaged again, at which time it is assigned to another open segment. This causes the previous copy of the LT to become obsolete, thus forming a xe2x80x98holexe2x80x99 in the closed segment. Garbage collection (GC) is the process of reclaiming xe2x80x98holesxe2x80x99 in closed segments. GC is started when the number of free segments falls below a certain threshold.
The process of garbage collecting a segment involves reading the segment""s directory from disk, then scanning each segment directory entry and comparing the LT""s address as indicated by the segment directory entry with the address as indicated by the LSA directory entry. If the two entries match, then the LT still resides in the segment and is considered xe2x80x98livexe2x80x99. All the live LTs are then read from disk into memory and sorted by neighborhood. These neighborhoods in destage then proceed to be destaged in the same manner as described above. These NIDs are assigned to open segments; when such open segments close successfully, the NIDs are garbage collected, thus decreasing the occupancy of the segments in which the NIDs previously resided. When a segment""s occupancy declines to zero, either as a result of garbage collection or as a result of movement of tracks from normal destage activity, the segment becomes free.
In a multi-nodal data storage system, a data storage controller has multiple nodes; each of the nodes may comprise, for example, an Intel model 1960 microprocessor. The multi-processing nodes are interconnected in a torus ring topology. A lower interface (LI) node executes microcode that manages the disk arrays including an LSA partition. The LSA subcomponent of the LI manages the LSA: A cache node manages the cache memory. The LI node and cache node can be the same physical entity, where microcode running at the same node performs the different functions.
A pair of LI nodes can provide shared management. Referring to FIG. 2 a first node may operate a series of storage devices 1 through N+1, while a second would operate storage devices N+2 through 2N+2. In the case that the first node fails the second node can take over the failed node and vice versa. The storage devices may be reassigned from one node to another to balance the nodes workload as well. Alternatively, each node is assigned exclusive control over a selected group of storage devices.
The processing activity on a node may shut down for a variety of reasons, such as the turning off of the power switch, loss of AC power to the subsystem, a disruptive EC activation, battery failure, quiescing the subsystem prior to taking a node off-line or adding a new node, removal of a node for servicing or performing diagnostic tests. An LSA subcomponent may receive a shut-down notification while it is performing any of its normal tasks, e.g. garbage collecting segments, assigning NIDs to locations in the array, writing data or parity on the devices, checkpointing the LSA directories, staging tracks, etc. As part of the fundamental operation of a log-structured system, data from several destage requests from the cache component are buffered by the LSA subcomponent; LTs are not written to the array immediately upon assignment to an open segment. When enough data has accumulated, the data is then stored on the device with a single disk write operation. At the time of a shutdown notification, there may be outstanding destage requests, which must be safely stored on disk before the shutdown procedure completes. One aspect that is problematic during shutdown is hardening all NIDs (i.e., writing both data and parity on disk) that have been issued by a cache component prior to the shutdown notification.
The task of hardening destage requests generally involves assigning all outstanding NIDs to open segments, then closing all those segments. The difficulty lies in determining when all outstanding destage requests have been received. While the cache component may have finished issuing all its requests at the time the command to shut-down is received, the requests from the cache component are not guaranteed to arrive in any order, nor are they guaranteed to arrive before the shut-down notification because of the asynchronous nature of inter-component communication. Such problems may be overcome if it can be determined when all such requests have been received, since it can then be determined when all outstanding requests have been assigned to open segments, for subsequently closing all open segments.
Installation of a timer that is set to expire in a fixed amount of time after receipt of the shut-down notification is one potential method. With the use of a timer, all its open segments are closed in order to harden all outstanding NIDs when the time expires. To ensure against expiration of the timer before the arrival of all requests when the network traffic is heavy, the timer value must be set to a relatively large number. Unfortunately, use of a timer with a sufficiently large timer value is problematic in some cases, such as loss of AC power, when time constraints are placed on the shut-down of the subsystem.
Accordingly, a need exists for a method and system of hardening destage requests in an LSA before shutdown of the processor. The present invention addresses such a need.
The present invention provides aspects for handling destage requests during shutdown in a log-structured array storage subsystem. In a method aspect, the method includes receiving a shut-down command, and utilizing at least three data structures for tracking destage requests when the shut-down command is received, wherein closing of open segments before completion of the shut-down is ensured. A further method aspect includes maintaining an outstanding requests list and destage requests list, forming a missing requests list based on the contents of the outstanding requests list and destage requests list when a shut-down command occurs, and tracking destage request processing with the outstanding requests list, destage requests list and missing requests list until all destage requests have been successfully completed.
Through the present invention, data structures are efficiently utilized during shutdown to handle destage requests and ensure proper hardening of the destage requests. The present invention further achieves handling of failed destage requests with minimization of the time needed to complete shut-down and avoidance of the possible wasting of free disk space. These and other advantages of the aspects of the present invention will be more fully understood in conjunction with the following detailed description and accompanying drawings.