A computer application would typically be store data in a random access memory (RAM) for quick access as RAM is much faster than a non volatile disk. However, if a system which hosts the computer application crashes, the contents stored in the RAM cannot be recovered as RAM is a volatile form of memory. For many applications, such as writing intensive applications, logging data stored in RAM to a form of temporary but persistent memory or a logging disk has been utilized so that if a system crash were to occur, a recover process would be performed to restore data from the logging disk to the volatile memory. The data logged to the logging disk would then be transferred to the system's normal disk drive for permanent storage.
However, there has not been a universal consensus for the best type of persistent memory used for logging disk. A flash based memory such as Non-Volatile RAM (NVRAM) would be considered expensive at this point in time while a hard disk drive (HDD) as been generally dismissed as being slow and unsuitable to be used as a logging disk. Phase Changing Memory (PCM) could be a faster alternative to a flash based memory but could not easily be adopted as a logging disk in the near future because of the small density and high cost of the PCM. Optimizing the latency and throughput of the disk logging process has also not been a trivial task.
Throughput could be defined as the total number of logging operations, including readings and writings, completed by a logging disk. Latency could be defined as the time between when a logging request is received by a queue of a logging disk and when the logging request is successfully written to the logging disk platter and ready to be acknowledged to user application. Latency and throughput would be two of the parameters which a logging system would optimize as an ideal disk logging system would have low latency and high throughput. A perceived response time of a logging request to a logging system would be dominated by the latency of a logging operation and its associated operations.
Providing high throughput and low latency for logging operations with small payloads such as 64 bytes or 128 bytes are critical as many applications only need to log the information associated with high-level operations, such as an update to a record in a B-tree page or a hash table bucket. The size of the information is typically small. Low logging latency is crucial because it directly impacts the user-perceived response time, and because many applications would be bound to the latency of the logging disk since more requests cannot be processed unless previously submitted requests have been completed.
However, it turns out that achieving both high throughput and low latency for logging operations, especially for fine-grained operations, is not at all trivial. Three key challenges have been identified. First, there is a mismatch between fine-grained logging and modern file systems. More concretely, the file systems use 4 KB blocks as the basic units of reading and writing, and hence logging a 64-byte or 128-byte record to a log file may require a read of the log file's last block as well as a write of the same block after appending the log record to it. Second, there are multiple processing steps on the data path between the system call interface to the disk platter that a logging operation's payload needs to traverse, and some of these steps may incur a per-operation overhead. Therefore that consecutive logging operation requests be properly merged so as to effectively amortize these per-operation overheads and still rein in the average logging latency. Third, to improve the raw data transfer capability of modern disks, it would be a way to transform high-level logging operation requests to low-level disk access requests for small data sizes in such a way that would prevent the logging disks from sitting idle most of the time.
Given these problems, existing techniques may not be fully address all of these challenges. For instance, delayed writing involving logging followed by an asynchronous write would shift the bottleneck to the logging operation. If the logging record size were assumed to be small, the underlying storage has to manage high throughput with low latency in cases of small random logging updates. Also many optimizing techniques would involve having accurate control over disk geometry details like rotational latency, seek latency, number of sectors in each track, zone coding, bad sectors mapping and other finer details. At this point in time, it would seek a way to implement ideas requiring these details because of the advanced disk compaction techniques and some disk manufacturers no longer supply the inner details of disk layouts because of complicated disk management techniques and competitive market. Also it should be mentioned that disk head prediction techniques could not easily be adopted since it could be difficult with modern disk drives.
Another approach involve maintaining a map of used and free blocks on disk in order to place the incoming data accurately on an unoccupied block and at the same time avoid track switch delay; However, maintaining mapping information would render the logging scheme unnecessarily complex and may require estimation of the geometry of the disk.
Also having a logging disk array using Redundant Array of Independent Disks (RAID) technology to handle small writes problem and NVRAM buffer to provide persistency to the cache has been proposed. However, latency in writing to NVRAM buffer is very low (in order of microseconds), and flushing NVRAM buffer to disk is not a trivial task. Though optimal size is chosen in units of stripe size, there are various other factors which determine whether the disk is utilized to the best extent. Another important factor to note is that NVRAM is a costly hardware resource as already been mentioned. In many situations, writing to NVRAM can yield very slow response times.
Another alternative could be to developed to handle small buffer size writes. The entire file system is organized as a sequential log which converts writes from user application and appends to the underlying log structure in the File System. But logging operations require persistent write to disk and hence synchronous writes are required and would obviously yields a very low performance on a naive log structured file system (LFS). Although modified techniques use NVRAM or flash to make LFS handle synchronous writes efficiently, both NVRAM and Flash are costly hardware alternatives. Though flash based disks provide high throughput and low latency, erase cycles are slow and hence the performance of the flash disk goes down when its utilization factor goes up. Also, the basic block size of flash ranges from kilobytes to megabytes and is much higher than the sector size of typical magnetic hard disks. The erase operation in flash devices requires the block size to be of bigger size to get optimal results. However having a bigger block size increases the latency of smaller requests which need to be aggregated to form a bigger block size.
Based on the aforementioned reasons, an alternative to disk logging would be proposed.