Various forms of network based storage systems are known today. These forms include network attached storage (NAS), storage area networks (SANs), and others. Network storage systems are commonly used for a variety of purposes, such as providing multiple users with access to shared data, backing up critical data (e.g., by data mirroring), etc.
A network storage system typically includes at least one storage server, which is a processing system configured to store and retrieve data on behalf of one or more client processing systems (“clients”). In the context of NAS, a storage server may be a file server, sometimes referred to as a “filer”. A filer operates on behalf of one or more clients to store and manage shared files in a set of mass storage devices, such as magnetic or optical disks or tapes. The mass storage devices may be organized into one or more volumes of a Redundant Array of Inexpensive Disks (RAID). Enterprise-level filers are made by Network Appliance, Inc. of Sunnyvale, Calif.
In a SAN context, the storage server provides clients with block-level access to stored data, rather than file-level access. Some storage servers are capable of providing clients with both file-level access and block-level access, such as certain filers made by Network Appliance, Inc.
A problem that is associated with many network based storage systems is latency, and more specifically, the latency associated with accessing the storage server's mass storage devices (“access latency”). From the perspective of a user, there is often a noticeable delay when a storage client accesses the mass storage devices through the storage server (this problem may be more evident during read operations, since in many systems, the storage server can acknowledge a write before the data has actually been written to mass storage, if it has a non-volatile buffer memory). The delay is caused primarily by the access latency of the mass storage devices and is inconvenient and often annoying to the user. It is desirable, therefore, to reduce this delay.
Caching is a well-known technique for reducing access latency in general. A cache is a storage device which is normally much faster than the primary storage facility that it supports, although it is also usually much smaller in capacity. Cache memory is commonly employed on a computer motherboard, for example, between a microprocessor and its main memory, to reduce the time required for the microprocessor to access commonly-accessed data. There are often multiple levels of cache memory employed within a computer, i.e., a level 1 (L1) cache, a level 2 (L2) cache, and so forth, where each level of cache is larger and slower than the lower cache level(s). These cache memories are typically very small in capacity compared to main memory (typically several orders of magnitude smaller).
Caching can also be used in a network context, to store larger amounts of data. For example, to reduce latency associated with a accessing content over the Internet (i.e., Web pages, multimedia, etc.), a caching appliance may be employed somewhere in the path between a content server and its client(s), to cache commonly accessed content. An example of such an appliance is the NetCache, made by Network Appliance, Inc.
A network storage server, such as a filer, generally includes internal cache memory to speed up microprocessor operations, but what is needed is an effective and cost-efficient way to reduce access latency associated with a large array of (typically external) mass storage devices of a storage server. A storage server may support many terabytes of storage and hundreds of clients at a time. Consequently, to be useful in this context a cache must be able to store large amounts of data, but must not be too expensive. The L1 and L2 (etc.) caches typically employed inside a storage server (i.e., on the motherboard) are much too small in aggregate to be useful for this purpose. A storage server can use its main memory to cache relatively small amounts of data for purposes of servicing client requests, but even main memory is too small to be highly effective as a cache for servicing client requests. On the other hand, a separate, external caching appliance, such as a NetCache, might have adequate storage capacity for this type of caching, but it would be too expensive and too slow to be practical for this purpose. What is needed, therefore, is an effective and cost-efficient way to provide caching for a network storage server, for purposes of servicing client requests.
Another problem associated with many processing systems, including large-capacity network based storage systems, is limited memory addressability. Modern programmable microprocessors have limited memory address space. In some cases, a processor's total address space may be limited by the chipset on which it is implemented, so that the actual usable (effective) address space of the processor is smaller than its design maximum address space. This restriction can prove problematic in a very large-capacity storage server system. Furthermore, conventional expansion buses, such as peripheral component interconnect (PCI), provide only limited capability for memory expansion.
Another problem associated with some large-capacity network storage systems relates to the sharing of state between failover partners in a cluster-failover storage system. Cluster-failover is a well-known technique by which multiple storage servers can be used to provide redundancy for each other. In one known prior art cluster-failover configuration, two or more file servers operating as failover partners each locally maintain a separate log, in non-volatile RAM (NVRAM), of all writes requested by clients since the last “consistency point” (a consistency point is the event of committing a set of recently received writes to disk). This non-volatile log, or “NVLog”, is used only in the event of a power loss or other similar failure, to reconstruct the state of the system since the last consistency point.
When one of the file servers receives a write transaction from a client, it updates its local NVLog and also sends a transaction message to its cluster partner to allow the cluster partner to update its NVLog. This sharing of state (i.e., NVLog) by the cluster partners problematic, because each storage server must have a priori knowledge of the identity of its failover partner. In addition, the write transaction cannot be acknowledged to the client until both cluster partners have recorded the transaction in their respective NVLogs. Consequently, the entire write transaction from start to finish can be undesirably slow from the user's perspective.
Yet another problem associated with some network storage systems relates to the acquisition of snapshots. A snapshot is a record of the exact state of a set of data (e.g., an entire volume or a directory within a volume) stored in a storage system at a particular instant in time. Snapshots may be used, for example, for backup purposes to recover the state of the system a particular time or to identify the source of data corruption.
Snapshotting can be used, for example, in a system which includes multiple file servers, each of which implements a portion of a file system. In such a system, when a snapshot of the entire file system is requested, each of the file servers takes a snapshot of its portion of the file system. One problem with such a system is that the snapshot request may be received at slightly different times at each of the file servers, due to differing and unpredictable communication distances and delays between the file servers. Yet it is critical that each file server take its snapshot at the same instant in time, to maintain data consistency.
What is needed, therefore, is a way to overcome these problems.