Network-oriented computing environments utilize high-performance, network-aware file systems for individual system data storage and data sharing for workgroups and clusters of cooperative systems. One type of high-performance file systems is a distributed file system. Traditional distributed file systems decouple computational and storage resources, where the clients focus on user and application requests and file servers focus on reading, writing, and delivering data.
Another type of distributed file system is one that separates the storage resources responsibility into a metadata server and a cluster of fileservers. The metadata servers maintain a transactional record of high-level file and file system transactions. For example and by way of illustration, file and file system transactions typically are: file creation, file deletion, file modification, directory creation, directory deletion, directory modification, etc. On the other hand, the fileserver is typically responsible for actual file system input/output (I/O), maintaining file allocation data and file size during 10, etc. Separating the transactional recording and file manipulation is a more efficient division of labor between computing and storage resources.
FIG. 1 illustrates one example of a prior art cluster file system 100 comprising a metadata server and multiple distributed object storage targets as file servers. In FIG. 1, the cluster file system comprises multiple clients 102A-N coupled to multiple distributed object store servers (OSS) 104A-M and a metadata server (MDS) 108 over data network 110. The MDS is attached to a metadata target (MDT) 110 which provides storage for the metadata in the file system. In addition, each OSS 104A-M is coupled to one or more object storage targets (OST) 106A-M. Typically, clients 102A-N are computers that utilize the fileserver cluster. Typically, clients are personal computers, laptops, handheld devices, computer servers, web servers, application servers, etc. and/or combination thereof. As per above, MDS 108 maintains a record of high-level file transactions. These transactions are used to preserve file system consistency in case of an interrupt to the MDS software stack, which, for example can be caused by power loss. Each OSS 104A-M manages the file data and file allocation metadata stored in the corresponding OST storage array 106A-M. While for one example OSS storage array 108A-M is a LINUX based server using disk arrays as its OST, for other examples, OST storage array 106A-M can be an integrated device, such as an intelligent storage controller or intelligent disk. Furthermore, while for one example, the data network is a transmission control protocol (TCP) based gigabit Ethernet network, other examples may be different data network types (e.g., Quadrics (QSWNet), Myrinet, Infiniband, wireless, etc. and/or combinations thereof). In addition, cluster file system 100 may include a redundant MDS (not shown) that takes over in the event of MDS 108 going down.
Although cluster file system 100 is an advancement with respect to a traditional client/file server system, having one MDS 108 represents a single point of failure and a computational bottleneck. Even though cluster file system 100 may have a redundant MDS in case MDS 108 fails, redundant metadata servers do not by themselves relieve the computational bottleneck.