A cluster file system allows multiple client devices to share access to files over a network. One well-known cluster file system is the Lustre file system. Lustre is a Linux-based high performance cluster file system utilized for computer clusters ranging in size from small workgroup clusters to large-scale, multi-site clusters. Lustre can readily scale to support tens of thousands of clients, petabytes of storage, and hundreds of gigabytes per second of aggregate input-output (IO) throughput. Due to its high performance and scalability, Lustre is utilized in many supercomputers, as well as other complex computing environments, including large enterprise data centers.
There are a number of drawbacks to conventional Lustre implementations. For example, metadata servers and object storage servers in such arrangements generally do not incorporate an efficient caching mechanism. Instead, IO operations on these servers are generally performed directly with back-end storage arrays. To the extent caching is provided in a metadata server or object storage server of a Lustre file system or other similar cluster file system, it is typically implemented in a Linux kernel of the server and is therefore limited in both size and functionality. Moreover, Lustre does not include efficient failure protection modes, and can therefore suffer from excessive recovery latency upon certain types of failures, such as failures in metadata servers.
Accordingly, despite the many advantages of Lustre file systems and other similar cluster file systems, a need remains for additional improvements, particularly with regard to IO operations and failure recovery. For example, further acceleration of IO operations, leading to enhanced system performance relative to conventional arrangements, would be desirable.