Storage servers can store data redundantly, e.g., across multiple data storage devices. Storage servers may employ various forms of data storage devices, such as hard disk drives, solid state drives, flash drives, or tape devices for example. The data storage devices are typically implemented as one or more storage volumes that comprise a cluster of data storage devices, in which the volumes define an overall logical arrangement of storage space. For example, a storage server can serve a large number of discrete volumes each generally associated with its own file system.
To improve performance, storage servers can temporarily store various data storage operations and associated data received from client devices in a region of system memory. By storing the storage operations and data they receive in system memory, the storage servers can immediately return an acknowledgement message to the client devices rather than waiting for slower data storage devices to actually store the data prior to the acknowledgement being sent. However, system memory can be erased before the data is stored to data storage devices, e.g., in an event of a power (or other) failure.
To reduce the likelihood of data loss in such circumstances, storage servers may also store the storage operations and associated data in non-volatile random access memory (NVRAM), e.g., in a log stored in the NVRAM. By initially storing the storage operations in the log, the storage server can immediately return an acknowledgment to the client devices rather than wait for the operation to complete on one or more data storage devices.
Moreover, in the event of failure of the storage server, the storage operations can be replayed, thereby preventing loss of data. The NVRAM can have various associated circuitry to prevent data loss, e.g., battery backup, flash-type memory, etc. By logging storage operations (e.g., create file, write data, delete data, etc.) as “journal” entries in the log, a storage server can conform with data storage protocols that require the storage server only acknowledge storage operations after writing data to persistent storage.
The log can accumulate storage operations until a consistency point is triggered. Consistency points can be triggered at various time intervals (e.g., fixed time intervals), or when other events arise, e.g., the NVRAM is almost fully filled. At each consistency point, data is transferred from the storage server system memory (e.g., the NVRAM) to underlying data storage volumes on data storage devices, and the system memory is cleared of the transferred data upon successful transfer.
If the storage server's operations are interrupted unexpectedly, e.g., because of power failure or other subsystem problem, its operating system or file system can recover by using information stored in the log between the time of the last consistency point and the unexpected interruption, e.g., by using a replay operation.
Technological advances have caused a significant reduction in the price of NVRAM and processors with a concomitant increase in logic density. Thus, it is now possible to employ many more NVRAM and many more processors (or processor cores) at a lower cost than was previously possible. It can be desirable to have a large amount of NVRAM to increase the throughput of the storage server. On the other hand, having more storage operations that are saved in the log can cause an increase in the time required to complete a replay operation.
To make it possible for the storage server to operate at high speed while maintaining an acceptable recovery time, the replay time per operation has been reduced to compensate for the greater number of operations being recorded in the log. The reduction in replay time per operation is made possible based on a sequencing process by which storage operations are identified as parallelizable and handled concurrently. In particular, logged storage operations determined to be parallelizable can be transferred to storage volumes concurrently by different processors or processor cores.
However, parallelization cannot be completed without regard to the operation tasks because storage operations can modify the same location offset or otherwise be overlapping or conflicting. If logged storage operations are replayed to storage volumes in sequence (e.g., in the same order as they appear in the log), then the storage volumes will be consistent with the order in which client devices transmitted the storage operations. However, when storage operations are handled in parallel, it is possible that a second storage operation can complete before a first storage operation, which is dependent on the second storage operations, resulting in inconsistencies in file system data.
The rules for identifying storage operations that are parallelizable are non-trivial and complex. Accordingly, storage operations that conflict are occasionally identified as parallelizable and replayed as such, resulting in inconsistent data in the file system. Currently, there is currently no effective, programmatic way to detect, in real-time, logic errors in the sequencing process, which resulted in a misidentification of storage operations as parallelizable, in order to prevent inconsistent data from being written to data storage devices.