File systems store files and store information about files and file system objects. The information stored in files or file system objects may be referred to as data. The information about files or file system objects may be referred to as metadata. When the data in a file or file system object changes, a file system may want to update the metadata about that file or file system object. For example, if the contents of a file are changed, the file system may want to memorialize the time at which the change was made and by whom the change was made. A journal may be employed to protect the data and metadata in a file system.
Making a change to a file or file system object may require the file system to perform updates to several independently stored pieces of metadata that the underlying storage does not support as an atomic operation. This set of changes takes the file system from one consistent state to another. Undesirable conditions may arise if a series of operations are only partially recorded. Thus, a file system may be required to treat a series of operations as a transaction. Example transactions may include allocating space for a file or file system object, creating a file or file system object, updating a file or file system object, deleting a file or file system object, or other operations. While the file system may choose to treat operations as a transaction, an underlying operating system or other actor (e.g., storage system) may only be able to guarantee that individual members of the series of operations are performed as atomic operations.
Therefore, file systems may use a journal to help support correctly performing a series of operations as a single file system transaction. The journal may be, for example, a disk-based structure that can store information about operations to be performed to transition a file system from a first state to a second state. The journal may be used to store a complete representation of the set of operations that are to be completed for the file system transaction. For example, the journal may store a linear sequence of underlying operations that are to be performed as part of the file system transaction. Once the set of operations to be performed are written in the journal, the individual updates to metadata can be performed safely in the knowledge that if something goes wrong, it is possible to recover the complete set and reapply them later using the information stored in the journal.
A journal may play a record-keeping role to allow for safe transitions from one stable state to another stable state in a file system in a manner that can be guaranteed by the infrastructure underlying the file system. A journal provides a persistent structure that allows the file system to restore itself to a self-consistent state by examining its contents after a crash and using them to reconstruct the recently updated metadata components to a consistent state.
One issue with file systems arises due to the difference in latency between memory and non-memory (e.g., disk, tape) storage. This latency can produce conditions where changes made in one area (e.g., memory) are out of sync with changes made in another area (e.g., disk). Additionally, this latency motivates a file system to store in memory changes that are to be made to data on disk and then to make the actual changes on disk at a later time. For example, a series of reads and writes to a file may be made virtually in memory and then only made physically on disk at a later time. While this delayed update approach may solve one problem associated with excessive random input/output (i/o), it may produce another problem associated with memory and disk being out of sync. The file system metadata may indicate that a change has been made, and that change may have been performed in memory, but the actual underlying data on disk may not have been changed.
A journal may be used to protect the state of things that are only in memory. The journal may be used to record, in persistent storage (e.g., disk, solid state drive) the changes that have been made in memory but that have not yet been propagated to persistent storage. When the changes have been propagated to persistent storage, the journal entries that were protecting the changes can be discarded. More generally, the journal can be used to transfer in-memory state to on-disk state. The in-memory state may be useful to a running program that does not have time to wait for disk i/o and the on-disk state may be useful as a recovery tool. For example, if the running system terminates unexpectedly, the journal may be used to determine which transactions need to be replayed to return the file system back to a stable point before the failure.
While a journal facilitates mitigating some issues with a file system, the journal may produce new issues. One issue concerns the journal having a finite size and thus becoming full. Unlike database journals, file system journals are typically implemented as a circular buffer on disk. New transactions are recorded at the ‘head’ of the journal, and the oldest transactions still protected by the journal are at the ‘tail’ of the journal. The ‘head’ of the journal cannot be allowed to overrun the ‘tail’ without first protecting the old transactions by flushing their individual metadata updates out to disk. If the journal head was allowed to overwrite the tail, then information will be over-written, creating conditions under which the journal is no longer providing consistency protection for the file system.
Before starting a journal transaction, a determination may be made to discover whether there is sufficient free space to support the transaction. Conventionally, a “worst case scenario” approach to transaction space usage has been taken. There are at least two problems with the worst-case scenario. First, it is difficult to calculate what the actual worst-case scenario is and this calculation tends to be an error prone part of the system. The calculation of the worst case scenario may be significantly too large, or, even worse, might not be big enough. Second, because being too small is calamitous, the worst-case scenario typically significantly over-estimates the amount of free space required for a transaction with many data structures, the worst-case size may be several orders of magnitude larger than the average use case. Always reserving the largest possible amount of space is inefficient and produces undesired pressure on the journal to flush metadata.
Another issue concerning allocating space in the journal is the required flush of old metadata. Before allocating space in the journal, conventional systems first insured that the material in the journal being overwritten was stable. This requires writing out all the metadata that would be overwritten by the worst case space reservation. The larger this reservation is, the more metadata needs to be flushed while holding up new activity in the file system. Since the worst case reservation can be so much larger than the actual used space, this can cause excessive flushing of state and slower operation than would otherwise be possible.