As enterprises move toward distributed operations spread over several remote locations, multi-site collaboration and joint product development becomes increasingly common. Although this technology has proven to be useful, it would be desirable to present additional improvements. Distributed operations require data sharing in a uniform, secure, and consistent manner across the enterprise acceptable performance. While large amounts of data can be easily shared on a local-area network (LAN) using standard file access protocols, these mechanisms do not scale well when extended to remote offices connected over a wide-area network. Moreover, deployment of alternate solutions such as a wide-area filesystems geared for global scalability is rarely chosen by enterprises; the cost of maintaining and operating one or more filesystems and protocols for local and wide-area access and integrating data between them can be prohibitive.
Data and file sharing has long been achieved through traditional file transfer mechanisms such as file transfer protocol (FTP) and distributed file sharing protocols such as network file sharing (NFS) and common Internet file system (CIFS). The file sharing protocols tend to be “chatty”, having been designed for local area network (LAN) environments where clients and servers are located in close proximity.
Data sharing can also be facilitated by a clustered filesystem. While clustered filesystems are designed for high performance and strong consistency, they are neither inexpensive nor easy to deploy and administer. Other filesystem architectures attempted to solve the file sharing issues of a wide area network through a distributed architecture that provides a shared namespace by uniting disparate file servers at remote locations into a single logical filesystem. However, these technologies incur substantial deployment expense and have not been widely adopted for enterprise-wide file sharing.
One conventional approach comprises the Andrew file system (AFS), which is a globally distributed filesystem. AFS introduces the concept of a cell as an administrative domain and supports a global namespace. AFS also introduces the volumes as an abstraction for data management. AFS has extensive client-side file caching for improving performance and supports cache consistency through callbacks. AFS further allows read-only replication useful for improving performance.
Another conventional approach comprises most of the features of AFS but is also integrated with the Open Software Foundation (OSF) common desktop environment (DCE) platform. This conventional approach provides improved load balancing and synchronization features along with transparency across domains within an enterprise for easy administration. Other AFS-related filesystems deals with replication for improved scalability while focusing on disconnected operations.
Recently there has been some work on leveraging the features of NFSv4 to provide global naming and replication support. One conventional approach focuses on providing a global namespace and read-write replica synchronization. Other related efforts are geared toward improving performance by using parallel data access.
Orthogonal to the distributed file system work, there have been a number of conventional approaches utilizing clustered filesystems. These conventional approaches are geared for high-performance solutions using high-speed network connections and tightly coupled servers.
Additional conventional technologies have explored grouping together servers for a common file service. One conventional approach decentralized the storage services across a set of cooperating servers in a local area environment. In contrast, another conventional approach comprises an archival system, aimed at storing huge collections of data using worldwide replica groups with security and consistency guarantees. Yet another conventional approach focuses on security and byzantine faults, where a loose collection of untrusted, unsecured servers are grouped together to establish a virtual file server that is secure and reliable. A further conventional approach couples islands of data for scalable Internet services.
The need, therefore, is not to build yet another globally distributed filesystem but to group together a set of heterogeneous, multi-vendor, independent, and distributed file servers such that the distributed file servers act as one. It is desirable that data remain where it is, possibly in legacy filesystems or on a variety of single server filesystems. Instead, a system is needed that allows clients to seamlessly navigate the data without additional client-side software or configuration and manage the data at fine granularities for replication, migration and caching. The data management in conventional methods is done at the whole filesystem granularity.
What is therefore needed is a system, a computer program product, and an associated method for emulating a virtual boundary of a file system for data management at a finer fileset granularity. The need for such a solution has heretofore remained unsatisfied.