File systems store files and store information about files. The information stored in files may be referred to as data. The information about files may be referred to as metadata. The metadata may include, for example, a file name, a file size, a file parent, a file descendant, a file access time, a file owner, file permissions, and other information. Some of the metadata for an individual file may be stored in a data structure known as an inode. The inodes and metadata for a file system may also be stored collectively. The metadata has both structure and content. When the data in a file or about a file changes, a file system may want to update the metadata about that file. For example, if the contents of a file are changed, the file system may want to memorialize the time at which the change was made and by whom the change was made. Actions on a file produce actions on a corresponding inode. To reduce delays caused by file system operations, a file system may store at least a part of the file system metadata in memory.
When file systems were small, the metadata associated with file systems was also relatively small and thus could be cached entirely in memory. However, as file systems expanded, and as file systems spread across multiple apparatus into distributed file systems, the metadata has grown. In some cases, the metadata for a file system may become so large that it is difficult, if even possible at all, to cache all the metadata in memory. Thus, some metadata may need to be stored on disk and brought into memory on an as-needed basis. Unfortunately, random input/output for metadata produces considerable, even unacceptable delays in file system processing. In particular, startup processing may consume an unacceptable amount of time. Additionally, failover time may also be compromised by having to deal with buffers that may require an unacceptable amount of time to pre-initialize on failover.
A desirable feature in a distributed file system is having fast failover between different apparatus in the distributed system. Failover depends, at least in part, on having metadata buffers available. However, having large metadata buffers that support storing large amounts of metadata in memory may compromise failover time due to the pre-initialization of the buffers required at startup time.
In an attempt to reduce the impact of storing metadata on disk, reductions in metadata sizes have been made. For example, the disk space used for metadata has been reduced by more efficient representations. One reduction has been achieved by placing the contents of small directories inside the inode directly instead of inside a separate block. Another reduction may involve supporting only a smaller (e.g., 4 KiB) file system block size. Earlier file systems may have used larger (e.g., 18 KiB, 64 KiB) file system block sizes that wasted space when only a small percentage of the block was used. While these reductions have reduced metadata size issues, the amount of metadata may still exceed the size of memory or caches dedicated to storing metadata.
As metadata reductions occur, it may become feasible to store metadata in memory. Storing metadata in memory improves performance of metadata operations that only require read operations (e.g., lookup). Storing metadata in memory may also improve performance of metadata operations that perform writes because contention for a disk on which metadata is stored may be reduced. Unfortunately, there may be unintended consequences of storing all metadata in memory.
Consider a configuration where there is an 8 GiB cache storing 4 KiB buffers. In this configuration there may be more than two million 4 KiB buffers to be initialized. The time required to initialize over two million 4 KiB buffers may delay file system manager activation. Also, pre-initializing over two million 4 KiB buffers may add time to failover processing. Additionally, while having a large number of small caches is useful for some situations (e.g., locality of access is above a threshold), having a large number of small caches may be unwieldy for certain actions (e.g., walking an entire file system during a file system scan). Thus, a balance between buffer size, cache size, and buffer processing may need to be struck.