1. Field of the Invention
This invention relates to the field of filesystems and, more particular, to distributed coherent filesystems.
2. Description of the Related Art
Distributed computing systems, such as clusters, may include two or more nodes, which may be employed to perform computing tasks. Generally speaking, a node is a group of circuitry designed to perform one or more computing tasks. A node may include one or more processors, a memory and interface circuitry. Generally speaking, a cluster is a group of two or more nodes that have the capability of exchanging data between the nodes. A particular computing task may be performed upon one node, while other nodes perform unrelated computing tasks. Alternatively, components of a particular computing task may be distributed among the nodes to decrease the time required to perform the computing task as a whole. Clusters may provide highavailability such that if one node fails, the other nodes of the cluster can perform the required tasks. Generally speaking, a processor is a device configured to perform an operation upon one or more operations to produce a result. The operations are performed in response to instructions executed by the processor.
A filesystem may be employed by a computer system to organize files and to map those files to storage devices such as disks. Generally speaking, a storage device is a persistent device capable of storing large amounts of data. For example, a storage device may be a magnetic storage device such as a disk device, or an optical storage device such as a compact disc device. Data on storage devices may be organized as data blocks. The filesystem organizes the data blocks into individual files and directories of files. The data within a file may be non-contiguously stored in locations scattered across the storage device. For example, when a file is created, it may be stored in a continuous data block on the storage device. After the file has been edited, unused gaps within the file may be created. The filesystem may allocate the unused gaps to other files. Accordingly, the original file may now be stored in several non-contiguous data blocks.
Filesystems include internal data, called meta-data, to manage files. Meta-data may include data that indicates: where each data block of a file is stored, where memory-modified versions of a file are stored, and the permissions and owners of a file. The above uses of meta-data are for illustrative purposes only and are not intended to limit the scope of meta-data. In one conventional filesystem meta-data includes inode data. Inode data specifies where the data blocks of a file are stored on the storage device, provides a mapping from the filesystem name space to the file, and manages permissions for the file.
Directories are types of files that store data identifying files or subdirectories within the directory. Directory contents, like plain files, are data blocks. The data blocks that comprise a directory store data identifying the location of files or subdirectories within the directory.
Conventional filesystems, such as the Unix filesystem (UFS) or Veritas filesystem (VxFS), may not have a filesystem interface whereby the operating system can flush meta-data to a storage device. For the purpose of clarity, data that comprises data files and subdirectories will be referred to as "actual data" to distinguish it from "meta-data".
In distributed computer systems, filesystems may be implemented such that multiple nodes may access the same files. Generally speaking, a distributed filesystem is a filesystem designed to operate in a distributed computing environment. The distributed filesystem may handle functions such as data coherency. Distributed filesystems address several important features of distributed computing systems. First, because multiple nodes can access the same file, the system has high availability in the event of a node failure. Additionally, performance may be increased by multiple nodes accessing the data in parallel. The performance increase is especially evident in systems, such as Internet web servers, which service mainly read data requests.
Because multiple nodes may access the same data in a distributed filesystem, data coherency must be addressed. For example, a first node may modify a copy of a file stored in its memory. To ensure consistency, an algorithm must cause the first node to flush file data and meta-data to a storage device, prior to a second node accessing that file. Accordingly, the second node may require a way to cause the first node to flush the data to the storage device. In conventional filesystems, such as UFS and VxFS, external interfaces may be used to cause the filesystem to flush actual data to the storage device. Conventional filesystems, however, typically do not have an external interface to cause the filesystem to flush meta-data to the storage device. Accordingly, it is difficult to use conventional filesystem as building blocks of distributed filesystems, and distributed filesystems are typically developed from scratch or substantially modified versions of conventional filesystems. Unfortunately, it is a major investment of time and money to develop a filesystem from scratch, and even more of an investment to develop a distributed filesystem from scratch. Additionally, developing a distributed filesystem from scratch makes it difficult to leverage off evolving filesystem technology and to take advantage of the familiarity and support of existing filesystems. Users of a new filesystem must learn and adapt to an unfamiliar user interface. Alternatively, if a distributed filesystem is designed to accommodate users of different filesystems, the developer must develop and support many different versions of the distributed filesystem.
Another shortcoming of current distributed filesystems is that the filesystems typically only support one coherency algorithm. Different algorithms are advantageous in different systems. For example, one coherency algorithm may work well in systems with very few write operations, but not work well in systems with a higher percentage of write operations.
Still another shortcoming of current distributed filesystems is that the granularity of a coherency unit is typically fixed. Generally speaking, a coherency unit is a group of data for which coherency is maintained independent of other data. For example, the granularity of a coherency unit may be specified as a file. Accordingly, a coherency operation may be required if two nodes attempt to access the same file. This granularity may work well in systems with relatively small files that are infrequently written to by different nodes, but it may not work well in systems with large files to which different portions are written by different nodes. In the latter system, a granularity of one or more pages of a file may be advantageous.
What is desired is a distributed filesystem that uses largely unmodified conventional filesystems as building blocks, such that the distributed filesystem may interface to a conventional filesystem to provide the distributed nature of the filesystem. Additionally, a distributed filesystem in which the coherency algorithm and the granularity of a coherency unit may be selected by a client is desirable.