The present invention generally relates to distributed file systems, and more particularly to management of a namespace in a distributed file system.
A partition-based approach to achieve high scalability for access to distributed storage services is currently being explored. The partition-based approach addresses the inherent scalability problems of cluster file systems, which are due to contention for the globally shared resources. In a partition-based approach, the resources of the system are divided into partitions, with each partition stored on a different partition server. Shared access is controlled on a per-partition basis.
All implementations of partition-based distributed storage services must maintain namespaces, which generally are distributed and reference objects that reside in multiple partitions. A namespace provides a mapping between names and physical objects in the system (e.g., files). A user usually refers to an object by a textual name. The textual name is mapped to a lower-level reference that identifies the actual object, including a location and object identifier. The namespace is implemented by means of directories, which are persistent files of  less than name, reference greater than  pairs.
The requirement for consistency of the namespace can be formalized in terms of four properties:
1. One name is mapped to exactly one object.
2. One object may be referenced by one or more names.
3. If there exists a name that references an object, then that object exists.
4. If an object exists, then there is at least one name in the namespace that references the object.
Changes to the global namespace take the form of one of two classes of operations: link operations that insert a reference to an object, for example, a newly created object; and unlink operations that remove a reference to an object. Any of the above operations potentially spans more than one server in a distributed system. The server containing the directory (or xe2x80x9cnamespace objectxe2x80x9d) and the server containing the referenced object may be physically separated.
Some systems presently use 2-phase commit to implement distributed namespace operations. However, to provide recoverability in the event of system failure during a namespace operation, atomic commitment protocols perform synchronous logging in the critical path of the operations, thereby incurring considerable overhead.
In addition to the overhead, atomic commitment protocols lock system resources across all the sites involved in an operation for the duration of the multi-phase commit, thereby increasing contention for resources such as free block lists and block allocation maps. Atomic commitment protocols also follow a conservative approach for recovery from failure: in the presence of failure, incomplete operations are typically aborted rather than attempting to complete the operation.
A system and method that address the aforementioned problems, as well as other related problems, are therefore desirable.
In various embodiments, the present invention performs namespace operations in a distributed file system. The file system is disposed on a plurality of partition servers, and each partition server controls access to a subset of hierarchically-related, shared storage objects. Each namespace operation involves a namespace object and a target object that are part of the shared storage objects. Namespace operations received at each partition server are serialized. In response to an unlink namespace operation, a reference in the namespace object to the target object is removed, and after removal the target object is modified in accordance with the unlink operation. In response to a link operation, the target object is modified consistent with the link operation. After modification of the target object, a reference to the target object is inserted in the namespace object. A log record is stored in association with each namespace operation when the operation is started, and a log record is deleted upon completion of the associated operation.
Various example embodiments are set forth in the Detailed Description and Claims which follow.