Technical Field
The present disclosure relates to storage systems and, more specifically, to an optimized file system layout of a storage system.
Background Information
A cluster of storage systems typically includes one or more storage devices, such as solid state drives (SSDs) embodied as flash storage devices, connected to one or more nodes of the cluster into which information may be entered, and from which the information may be obtained, as desired. Each storage system may implement a high-level module, such as a file system, to logically organize the information stored on the devices as storage containers, such as files. Each storage container may be implemented as a set of data structures, such as data structures (blocks) on SSD that store data for the storage containers and metadata structures in memory of the storage system that describe the data of the storage containers. For example, the metadata may describe, e.g., identify, storage locations on the devices for the data.
A distributed consensus protocol may be employed in the cluster to maintain configuration information pertaining to a current state of the cluster and stored on local storage, e.g., SSD, of the nodes. Typically, the distributed consensus protocol uses a generic file system having a generic on-disk format or layout of files to store the configuration information, e.g., one file for node membership in the cluster, one file for a snapshot of the configuration information, and multiple files for log entries. Multiple issues may arise in such an implementation, including (i) files used for snapshots of the current state may be fragmented, (ii) if a rate of change is high too many log entries may accumulate before they are purged, and (iii) log entries which are usually read in sequence may be fragmented, necessitating substantial overhead to read sequentially. In addition, each update to a file used for snapshot, log or membership usually manifests as multiple write and commit operations to disk, e.g., for data and metadata. As a result, some operations may partially commit the data or metadata.