Distributed file systems store data on remote servers that are accessible to users via networks. From the user's perspective, files shared via a distributed file system appear to be accessed as if the user were receiving files from a local storage system. Additional storage capacity may be added by increasing the number of servers on which data is stored. Distributed file systems are typically designed for transparency, heterogeneity, and scalability. This allows for convenient, secure sharing of files among nodes of a network with flexible deployment.
There are various types of implementations of distributed file systems in use today. Typically, these implementations adhere to one of two primary approaches. A first approach, distributed caching and locking, involves allowing a node access to data while locking (excluding) other nodes from accessing the data by invalidating the cache of the other nodes. The access is granted by a lock manager that grants the access when the data is not already locked. This approach usually works well as long as data access is local, but faces challenges when multiple nodes attempt to access the same data concurrently. Specifically, performance may degrade due to excessive locking or caching overhead when multiple lock requests are received around the same time.
A second approach, sharding, involves dividing the data among nodes such that each node includes a subset of the data. A drawback of this approach is that, when data that is not stored on a node is accessed, the data must be transferred from another node. Another drawback of this approach is that frequently accessed files on a node may result in the node being overwhelmed, particularly when transfer is required from multiple other nodes. Some existing solutions for this drawback include moving the frequently accessed files among nodes based on load to distribute popular files to different nodes, but these solutions do not effectively deal with short bursts of sudden frequent access.
Due to these and other challenges, performance of distributed file systems fails to scale appropriately as the distributed file systems are scaled. At some point, further scaling is effectively no longer possible.
Another challenge faced by existing distributed file systems is that such systems were typically planned to support a single type of application programming interface (API) for data access, such as the Posix file system. These systems typically support a limited set of protocols for the API. As a result, these systems cannot efficiently handle sharing as additional APIs are added. As an example, when the S3 protocol is implemented to provide compatibility with third party object storage applications and the S3 protocol is not already supported internally by the distributed file system, a S3 protocol server must be added to read the entire directory and sort the file names.
It would therefore be advantageous to provide a solution that would overcome the challenges noted above.