Many applications, in particular databases and filesystems, are expected to guarantee data write transaction atomicity, i.e., if one part of the transaction fails, then the entire transaction fails, and the database/filesystem state remains unchanged. To ensure the atomicity, databases and filesystems typically deploy either a journaling or copy-on-write scheme.
When the journaling scheme is being used, the data to be committed into databases/filesystems are first written to a dedicated journal region on the storage device, and then written to the target location inside databases/filesystems. As a result, the same content are written to data storage devices twice. The journal region typically occupies a continuous space on the data storage devices. Hence, journaling typically incurs sequential writes to the data storage devices (i.e., multiple data blocks are consecutively written to a continuous storage space on the data storage device). Meanwhile, the data written to the target location inside databases/filesystems could scatter throughout the entire storage space, leading to random writes to the data storage device. Therefore, although the use of journaling doubles the size of data being physically written to the data storage device, the impact on the overall database/filesystem performance depends on the performance difference between sequential write and random write of the data storage device. When hard disk drives (HDDs) are used, journaling may incur very small or even negligible performance penalty since HDDs perform sequential writes much faster than random writes. However, when solid-state drives (SSDs) are used, journaling could cause significant performance penalty, since the performance of sequential write and random write performance does not largely differ on SSDs, especially in the presence of a large number of write requests.
For databases/filesystems that deploy the copy-on-write scheme, data on the storage devices are never updated-in-place. Instead, databases/filesystems write the updated data to a new location on the storage device, and accordingly update the metadata to record the change. Although copy-on-write avoids doubling data write size as in the case of journaling, a large amount of stale data could scatter throughout the storage space as the copy-on-write process continues. This will cause significant storage device fragmentation, leading to storage device performance degradation. To mitigate this effect, garbage collection (GC) should be invoked periodically to re-arrange the data placement and hence reduce the fragmentation of the storage device. For both HDDs and SSDs, GC incurs a large number of data I/O operations, leading to noticeable performance penalty.
In summary, when SSDs are being used, both journaling and copy-on-write tend to cause significant system performance penalty at the cost of ensuring write atomicity. This problem has been well recognized, and a variety of solutions have been developed to address this problem. Regardless of the specific design techniques, all the existing solutions demand the change/modification of the databases/filesystems source code. This unfortunately leads to a very high barrier for these solutions to be adopted in practice. Hence, it is highly desirable to have a design solution that can adequately address the copy-on-write/journaling-induced performance penalty without demanding any changes/modifications of the databases/filesystems source code.