1. Field of the Invention
This invention is related to the field of storage management and, more particularly, to management of distributed storage in a multi-server environment.
2. Description of the Related Art
In the file serving and network storage environments, two trends are emerging. The first trend is a movement away from storage being directly attached to servers and towards storage being network attached. In some configurations, this network-attached storage may be directly accessible by clients. Enterprise computing environments are increasingly using configurations such as computer clusters, Storage Area Networks (SANs), Network Attached Storage (NAS), and other centralized storage mechanisms to simplify storage, improve availability, and handle escalating demands for data and applications.
Clustering may be defined as the use of multiple computers (e.g., PCs or UNIX workstations), multiple storage devices, and redundant interconnections to form what appears to external users as a single and highly available system. Clustering may be used for load balancing and parallel processing as well as for high availability.
The storage area network (SAN) model places storage on its own dedicated network, removing data storage from the main user network. This dedicated network most commonly uses Fibre Channel technology as a versatile, high-speed transport. The SAN includes one or more hosts that provide a point of interface with LAN users, as well as (in the case of large SANs) one or more fabric switches, SAN hubs and other devices to accommodate a large number of storage devices. The hardware (e.g. fabric switches, hubs, bridges, routers, cables, etc.) that connects workstations and servers to storage devices in a SAN is referred to as a “fabric.” The SAN fabric may enable server-to-storage device connectivity through Fibre Channel switching technology to a wide range of servers and storage devices.
The versatility of the SAN model enables organizations to perform tasks that were previously difficult to implement, such as LAN-free and server-free tape backup, storage leasing, and full-motion video services. SAN deployment promises numerous advantages, including cost management through storage consolidation, higher availability of data, better performance and seamless management of online and offline data. In addition, the LAN is relieved of the overhead of disk access and tape backup, data availability becomes less server-dependent, and downtime incurred by service and maintenance tasks affects more granular portions of the available storage system.
Of the two storage management trends, the trend is a movement away from deploying expensive high end servers towards deploying server appliances which include an aggregation of inexpensive server modules. These server modules are typically not of very high capacity, but the appliance can be scaled to meet the performance requirements by aggregating multiple such modules. Customers may start with a configuration which meets their current requirements and later add capacity as needed. To minimize administrative overhead, such an appliance may look and behave like a traditional file server.
Nevertheless, these new trends are not without problems of their own. Due to the tightly coupled nature of a cluster, for example, client machines are typically not configured to be part of the cluster itself. Therefore, in a cluster-based appliance, file system clients do not get direct access to file data and are instead served this data by one of the server nodes using standard protocols such as NFS or CIFS. Moreover, since there is only one node (the primary) doing metadata operations, that node may quickly become the bottleneck as the load on the file system grows. Even though a large number of clients may directly access user data, metadata operations are typically still handled by a single server. This single metadata server is likely to become a bottleneck.
This metadata bottleneck may be present in other network storage configurations, and it may limit the usability of large file systems. For example, extremely large file systems may demand unacceptably large times for file system consistency check (“fsck”) operations. In some instances, an entire file system may be offline for hours while undergoing an fsck operation. It is therefore desirable to improve the performance and availability of file systems during file system metadata operations such as consistency checks.