File systems store files and store information about files. The information stored in files may be referred to as data. The information about files may be referred to as metadata. When the data in a file changes, a file system may want to update the metadata about that file. For example, if the contents of a file are changed, the file system may want to memorialize the time at which the change was made and by whom the change was made. A journal may be employed to protect the data and metadata in a file system.
Making a change to a file may require the file system to perform updates to several independently stored pieces of metadata that the underlying storage does not support as an atomic operation. This set of changes takes the file system from one consistent state to another. Undesirable conditions may arise if a series of operations are only partially recorded. Thus, a file system may be required to treat a series of operations as a transaction. Example transactions may include allocating space for a file, creating a file, updating a file, deleting a file, or other operations. While the file system may choose to treat operations as a transaction, an underlying operating system or other actor (e.g., storage system) may only be able to guarantee that individual members of the series of operations are performed as atomic operations.
Therefore, file systems may use a journal to help support correctly performing a series of operations as a single file system transaction. The journal may be, for example, a disk-based structure that can store information about operations to be performed to transition a file system from a first state to a second state. The journal may be used to store a complete representation of the set of operations that are to be completed for the file system transaction. For example, the journal may store a linear sequence of underlying operations that are to be performed as part of the file system transaction. Once the set of operations to be performed are written in the journal, the individual updates to metadata can be performed safely in the knowledge that if something goes wrong, it is possible to recover the complete set and reapply them later using the information stored in the journal.
A journal may play a record-keeping role to allow for safe transitions from one stable state to another stable state in a file system in a manner that can be guaranteed by the infrastructure underlying the file system. A journal provides a persistent structure that allows the file system to restore itself to a self-consistent state by examining its contents after a crash and using them to reconstruct the recently updated metadata components to a consistent state.
A “lock”, as used in computer science and herein, refers to a synchronization mechanism for enforcing limits on access to a resource or other item. A lock may be designed to enforce a mutual exclusion concurrency control policy. A lock may be an advisory lock where threads willfully cooperate by acquiring the lock before accessing the protected resource. A lock may be a mandatory lock where an attempt to access the protected resource before the lock has been acquired will force an exception in the entity attempting the access. A lock may be, for example, a binary lock, which is also referred to as a semaphore. Different locks may implement different locking strategies. For example, a thread may have its execution blocked until a lock is acquired. A spin lock employs a lock strategy where the requesting thread spins (e.g., busy waits) until the lock is acquired. A spinlock may be efficient if threads are blocked for very short periods of time, but can introduce significant processing overhead when threads block for longer periods of time.
There are many conventional journals. Many of these conventional journals are associated with database processing. Typically these journals have been single-threaded monolithic applications that have employed numerous locks to control process flow and to provide synchronization. However, using locks may be inefficient because locks can force one process to wait while another process completes. Moving in lock-step where one action cannot begin until another action completes is appropriate in many circumstances, but may lead to inefficiencies when some operations could be performed in parallel.
File systems transfer in-memory state (e.g., file metadata) to disk. The in-memory state may be transient or unprotected while the on-disk state is more permanent and more protected. The journal may be, for example, a disk-based structure. A journal performs a number of different actions associated with protecting metadata until the transfer of state is complete. For example, the journal may protect in-memory-only changes by writing to disk a complete representation of the set of operations that are to be completed for the file system transaction. Once the set of operations to be performed are written in the journal on disk, the set of operations can be started safely in the knowledge that if something goes wrong it may be possible to back out of the set of operations using the information stored in the journal on disk. After the in-memory-only changes have been completed and propagated to disk, then the protecting journal entries can be deleted.
One issue with file systems arises due to the difference in latency between memory and non-memory (e.g., disk, tape) storage. This latency can produce conditions where changes made in one area (e.g., memory) are out of sync with changes made in another area (e.g., disk). Additionally, this latency motivates a file system to store in memory changes that are to be made to data on disk and then to make the actual changes on disk at a later time. For example, a series of reads and writes to a file may be made virtually in memory at a first time and then only made physically on disk at a second, later time. An efficient journal would be able to hold more metadata in the journal. Holding more metadata in memory would improve efficiency by reducing the amount of input/output (i/o) to disk that needs to be performed to maintain the file system state. This efficiency is related to the observed phenomenon of locality of file touching. If a file is touched at a first time, then it is likely that it may be touched again relatively soon. If the metadata for this file can be held in memory until the second or subsequent touches occur, then a disk i/o to record the first touch on disk may be avoided. However, the conventional lock-based monolithic journal approaches may miss opportunities to hold metadata in memory due to lock-step requirements to flush journal items in certain orders controlled by the locks.