Unless otherwise indicated herein, the materials described herein are not prior art to the claims in the present application and are not admitted to be prior art by inclusion in this section.
The number of computer systems and users connected to the Internet has been growing rapidly. In a very large distributed computing environment such as the Internet, the likelihood of file server failure increases with the number of file servers. Failure may occur due to software and hardware malfunctioning, excessive file server load, network congestion, and/or natural disasters. Such failures may lead to data unavailability and therefore less dependable service to users.
Current distributed file storage systems, such as the Coda file system, the Andrew file System, and the Echo file system, store data objects across multiple file servers. The Coda file system, which inherits largely from the Andrew file system, was developed to focus on availability issues. Coda does not reconfigure the file system to provide data availability. Instead it keeps read-only replicas of files at remote sites in case of a file server failure and disconnected operation. The Echo file system has various ways of detecting faults, such as file server failure, automatically and can report these by a daemon process that sends messages to administrators responsible for dealing with faults. However, reconfiguration is done manually. In general, availability of data is provided by keeping the file at a primary site for download, and its replicas at other sites in case of primary site failure. The Echo system relies heavily on redundant copies of everything in case of failure. A secondary site monitors a primary site availability and vice versa.
Each of these approaches for enabling high data availability and dependable service in a distributed environment tends to increase the complexity of file server management, increase the degree of calculation complexity, or provide insufficient redundancy.