Many contemporary data processing systems produce, consume and/or process vast quantities of data. Storing this data securely, so that it is unlikely to be lost or corrupted if a hardware failure, power outage or system crash occurs, yet accessibly, so that it can be read and written quickly, presents an ongoing challenge. The problem is particularly acute in a class of computing devices whose principal purpose is to administer data storage for many clients. These devices, called storage servers, may manage terabytes or petabytes of storage space and serve thousands of clients.
FIG. 2 shows an example of functional blocks and operational flows in a storage server processing a request from a client 200 to save data. The client's request 205 is received by a network access module 210, and is passed up to a protocol handling module 215 after any network-specific information (e.g. source and destination addresses) is removed. The request 220 is processed by the protocol handler 215 to verify data integrity, client access permissions, and so on; then the data 225 is passed up to a file system manager 230 for further processing.
File system manager 230 maintains data structures and other information (e.g., a “file system”) that permit it to present the storage space available at the storage server in a convenient form for clients' use. Typically, a storage server appears to a client as an indexed array of uniformly-sized data blocks, or as a hierarchical tree of directories (“folders”) containing other directories and files (“documents”). (Some storage servers present an object-oriented view, where arbitrarily-sized stored data may be identified and accessed via a unique key.)
The underlying data storage is often provided by electromechanical devices such as hard disk drives 235, but such devices may operate relatively slowly (or may be heavily utilized) so that forcing client 200 to wait for the data to be stored on the disks 235 would cause unacceptably long delays. Therefore, most storage servers perform some sort of buffering or caching so that a response (acknowledgement) can be sent to the client more quickly. A sophisticated storage server will implement measures to protect client data that has been acknowledged but not yet committed to a long-term mass storage device. In the example system described here, file system manager 230 stores a copy of client data 225 in a temporary memory 240 (client data copy shown as element 245 in FIG. 2), and can immediately return a response 250 to the protocol handler 215, which packages the response 255 and passes it to network access layer 210. The response is further encapsulated 260 for transmission over a network, and is eventually received by client 200.
While the response is being prepared and transmitted, file system manager 230 also begins the more time-consuming task of arranging for the client data to be stored on disks 235. For example, the data may be passed to RAID logic 265, where it is prepared for storage on one or more of a group of independent disks operated as a redundant array (a “RAID group,” where “RAID” stands for “Redundant Array of Independent Disks”). The data may be split into pieces 270, and a parity or checksum piece 275 computed, in preparation for writing on the disks of an array. A copy of the parity piece 275 may also be stored in temporary memory 240 (element 280). The prepared pieces 270, 275 are forwarded to storage drivers 285, and each piece 290 is stored on an appropriate one of the disks 235. Once the data is committed, the user and RAID parity/checksum data 245, 280 in temporary memory 240 can be discarded.
Temporary memory 240 is like a staging area that stores and protects the data between the time the client's write is acknowledged and the time all of the data is actually written to disk. If the storage server crashes or disks 235 become inaccessible, the client data copy 245 in temporary memory 240 permits the system to restart the write processing, and if the RAID data preparation has already been completed, RAID parity data copy 280 permits the RAID disks to be brought up to date.
FIG. 3 shows a detailed view of disks 235, depicted as arrays of blocks from block 0 to the last block of each disk, for disks 310, 320, 330 and 340. If the system crashes or disks become unavailable during RAID writing, so that some new data 350, 360 has been written, but some old data 370, 380 remains, then data 245, 280 in temporary memory 240 may be essential to ensure that the RAID devices can be brought to a consistent state without data loss.
Enterprise-class storage servers commonly use a temporary or staging memory as described above to improve write performance. However, under certain circumstances, the server may encounter a situation where it must either operate with degraded performance or discard acknowledged user data (causing data loss or corruption). These are, of course, both undesirable outcomes. Methods of avoiding these outcomes may be useful for improving storage server performance.