Distributed file systems store data on remote servers that are accessible to users via networks. From the user's perspective, files shared via a distributed file system appear to be accessed as if the user were receiving files from a local storage system. Additional storage capacity may be added by increasing the number of servers on which data is stored. Distributed file systems are typically designed for transparency, heterogeneity, and scalability. This allows for convenient, secure sharing of files among nodes of a network with flexible deployment.
Distributed file systems can store large amounts of data across different storage systems. To allow access to any given portion of data, the data must be properly indexed. However, existing solutions for indexing data in distributed storage systems face many challenges. These challenges can result in inefficiencies in performing access commands as well as issues caused by the underlying hardware.
One challenge faced by distributed storage systems is caused due to coupling of computing and storage resources. In existing distributed storage systems, each server includes both persistent storage resources as well as computing resources (e.g., CPU, RAM, etc.). To increase the number of either computing or storage resources, the other must also be increased. Also, when non-persistent memory (e.g., RAM) of a storage server fails, any metadata that is stored therein is lost and, as a result, data mapped by the indexing metadata becomes inaccessible at least until the metadata is reconstructed.
Another challenge for distributed storage systems occurs in ensuring consistency during concurrent access operations. To this end, existing solutions may implement locking in order to prevent accessing the same data at the same time. However, locking introduces many issues such as lock overhead, lock contention, and deadlocks. Each of these issues becomes more likely to occur as the distributed storage system scales up, thereby decreasing performance (e.g., speed of access, use of memory for locks, etc.).
One more challenge for distributed storage systems is related to managing hardware costs of the system as compared to performance. Different types of storage hardware may be more expensive than others but provide better performance. Balancing these results requires selecting suitable storages for both the metadata and data stored in the storages. For large scale implementations, high performing storages may be needed to provide adequate performance. But use of many high performing storages can be prohibitively expensive. For example, persistent storage technologies such as Flash and NVRAM provide reliable and fast alternatives as compared to traditional magnetic hard drives and can ensure a high number of write-erase cycles but cost significantly more.
Due to these and other challenges, performance of distributed file systems fails to scale appropriately as the distributed file systems are scaled. At some point, further scaling is effectively no longer possible.
It would therefore be advantageous to provide a solution that would overcome the challenges noted above.