Logging (i.e., the act of recording events as “log records” in a data structure referred to as a “log”) is a commonly used technique in computing environments. The log records that are collected in a log can be used for various purposes, such as ensuring data consistency in the face of system crashes/failures, executing asynchronous processes, and so on.
In some cases, the log records of a log may include pointers to data that is external to the log. For example, consider a log that is configured to keep track of on-disk data blocks created/updated in response to write I/O for the purpose of performing asynchronous storage deduplication. In this scenario, each log record can include a pointer to the on-disk location of a data block outside of the log, along with an indication that the data block has been created/updated due to a write operation. These log records can then be “replayed” at a later time in order to revisit the referenced data blocks and to merge together duplicate data, such that only a single instance of each unique data block is maintained on the storage tier.
In the above and other similar logging scenarios, when a write operation is received, existing logging implementations will typically wait for the write operation to finish before committing a corresponding log record to disk that points to the written data block. This ensures that the log record is valid (i.e., correctly indicates that the write operation has been successfully completed). If the log record is written before write completion can be confirmed, there may be situations where a system crash occurs between the writing of the log record and the completion of the write operation. This, in turn, will cause the log record to indicate that the write operation was successfully executed on the referenced data block, when in fact the system crash prevented that operation from completing.
While the foregoing approach of writing the data block first and the log record second ensures validity of the log record, it also suffers from certain drawbacks. For example, since the originator of the write operation must wait for two writes (i.e., the write operation itself and the writing of the log record) to finish in sequence before receiving an acknowledgement, the perceived I/O latency for the write operation is effectively doubled. This can be problematic in environments where low latency write performance is critical.