The rapid growth of file-based information, and today's fast expanding and diverse business environment, have led to isolated information and storage islands within an organization. In these information and storage islands, various NAS (Network Attached Storage) devices having different performance characteristics and capacities, and even from different vendors, make it very difficult to share the information and manage the storage. End users need to know where files are located and map/mount folders shared through NFS (network file system) or CIFS (common internet file system) protocol, referred to as share folders (or simply share) hereafter, in order to access files from different NAS devices. On the other hand, system administrators must spend a great deal of time reconfiguring the system, optimizing the storage utilization, and/or migrating data, due to various needs. These requirements are complicated and may cause system downtime and corresponding interruption to the end users, which is very costly.
A Global Namespace (GNS) that can provide a single namespace, file location independent storage service to the end users, and allow system administrators to more efficiently utilize the storage, is therefore proposed and can be found in the existing art. To accommodate the amount of data growing daily, and given the fact that various NAS devices coexist, a GNS design is expected to have no limitation on scalability, and is also expected to support existing heterogeneous NAS devices. However, existing GNS solutions, such as DFS (Microsoft Distributed File System), NAS Switch ([US20030097454A1], [US2007072917B2]), and P2P (Peer-to-Peer) solutions, either have limited scalability or do not support heterogeneous NAS devices.
In the typical DFS solution, DFS links created for the GNS are shared among domain controllers and root servers. Any modification of the namespace causes the entire DFS metadata to be propagated to all domain controllers and root servers. The number of DFS links that can be created for the namespace is therefore limited in order to reduce the impact on network traffic.
In the typical NAS Switch solution, an appliance device manages the share folders of the NAS devices and constructs a pseudo file system for the clients. The appliance device appears as a single NAS server to the clients and as a single NAS client to the NAS devices. All namespace and user data access must go through the appliance device, making the appliance device a potential performance bottleneck.
Both DFS and NAS Switch solutions support heterogeneous NAS devices and maintain the local namespace within a NAS device; however, they manage the GNS information in a centralized manner with limited scalability.
On the other hand, with the concept of the DHT (Distributed Hash Table), structured P2P technologies have recently become increasingly popular for file sharing in large-scale, geographically-distributed storage systems. Chord (“Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications”, ACM SIGCOMM, 2001) and Tapestry (“Tapestry: An Infrastructure for Fault-tolerant Wide-area Location and Routing”, UC Berkeley, 2000) are two typical examples of DHT-based P2P technology found in the existing art. In a DHT-based P2P storage system, files and storage nodes (known as peers) are hashed into the same ID space. Each peer manages a portion of the ID space and cooperates with each other peer to share files, through a logical DHT overlay. By maintaining multiple file copies in the system, peers can join and leave the system dynamically, without affecting the file sharing service to the end users. A DHT-based P2P storage system is highly scalable without any central control point or performance bottleneck, and is highly available by self-repairing the system in the event of storage node join/leave. However, existing NAS devices do not have P2P functionality and cannot construct a P2P storage system, making the existing NAS devices unusable.
It may be possible to construct a P2P storage system, and utilize the existing heterogeneous NAS devices simply as additional storage capacity to the peers. However, this requires the system administrator to manually map/mount the share folders in the existing NAS devices to the peers, making it very difficult for the system administrator when the number of existing NAS devices is large or when peers fail. Further, the files stored in existing NAS devices are purely based on the hash value of the files (for example, the file name), making it impossible to maintain a meaningful local namespace within a NAS device, which is supported by both DFS and NAS Switch solutions.
Hence, there is an increasing need for a GNS solution that can maximize system scalability, support existing heterogeneous NAS devices, and at the same time, maintain a meaningful local namespace within a NAS device.