The present invention relates to mirroring of storage devices (such as disks) in a network storage system and, more particularly, to circular or bi-directional mirroring of flexible volumes in a network storage system.
Computer workstations and application servers (collectively hereinafter referred to as “clients”) frequently access data that is stored remotely from the clients. In these cases, computer networks are used to connect the clients to storage devices (such as disks) that store the data. For example, Information Systems (IS) departments frequently maintain “disk farms,” tape backup facilities and optical and other storage devices (sometimes referred to as media) in one or more central locations and provide access to these storage devices via computer networks. This centralized storage (commonly referred to as “network storage”) enables data stored on the storage devices to be shared by many clients scattered throughout an organization. Centralized network storage also enables the IS departments to store the data on highly reliable (sometimes redundant) equipment, so the data remains available, even in case of a catastrophic failure of one or more of the storage devices. Centralized data storage also facilitates making frequent backup copies of the data and providing access to backed-up data, when necessary.
Specialized computers (variously referred to as file servers, storage servers, storage appliances, etc., collectively hereinafter referred to as “filers”) located in the central locations make the data on the storage devices available to the clients. Software in the filers and other software in the clients cooperate to make the central storage devices appear to users and application programs as though the storage devices were locally connected to the clients.
In addition, the filers can perform services that are not visible to the clients. For example, a filer can aggregate storage space in a set of storage devices and present all the space in the set of storage devices as a single “volume.” Clients treat the volume as though it were a single, large disk. The clients issue input/output (I/O) commands to read or write data from or to the volume, and the filer accepts these I/O commands. The filer then issues I/O commands to the appropriate storage device(s) of the set to fetch or store the data. The filer then returns status information (and, in the case of a read command, data) to the client. Each block of the volume maps to a particular block of one of the storage devices and vice versa. Thus, a volume represents the storage capacity of a whole number of storage devices.
Some filers implement logical volumes (sometimes called “flexible” volumes). A flexible volume does not necessarily represent the storage capacity of a whole number of storage devices. Instead, the flexible volume generally represents a portion of the total storage capacity of the set of storage devices on which the flexible volume is implemented. A filer implements the flexible volume as a container file that is stored on the set of storage devices. When a client issues I/O commands to read or write data from or to the flexible volume, the filer accepts these I/O commands. The filer then issues I/O commands to the container file to fetch or store the data. The filer then returns status information (and, in the case of a read command, data) to the client.
Storage space on the set of storage devices is not necessarily pre-allocated for the container file. Blocks on the storage devices can be allocated to the container file as needed. Furthermore, as additional space is required on the flexible volume, the container file can be extended. Thus, unlike traditional volumes, each block of a flexible volume maps to a block of the container file, but a mapping between the blocks of the flexible volume and blocks on the storage devices does not necessarily occur until these blocks of the container file are written to the storage devices. Several container files can be stored on one set of storage devices. Thus, several distinct flexible volumes can be implemented on the same set, or an overlapping set, of storage devices.
“Volume mirroring” is another service provided by some filers. A volume mirror (sometimes referred to as a “destination volume”) is an exact copy of another volume (sometimes referred to as a “source volume”). Typically, the source volume is connected to one filer (a “source filer”) and the destination volume is connected to a different filer (a “destination filer”), and the two filers are connected to each other via a network. When a client writes data to the source volume, the source filer causes a copy of the data to be written to the destination volume. Because the source and destination volumes are mirror copies of each other, volume mirrors can be used to protect against a catastrophic failure of the source or destination volume or of either filer.
For performance reasons, when file write requests are received by a filer (regardless of whether volume mirroring is involved), the filer caches the write requests and occasionally flushes the cache by writing the cache's contents to the appropriate storage device(s). The cache contains data blocks that are to be written to the storage device(s). To enable the filer to continue accepting write requests from clients while the cache is being flushed, the filer divides the cache into two halves and alternates between the halves. That is, while one half of the cache is being flushed, the other half of the cache is used to store write requests, and vice versa.
The filer also stores information about write or modify operations that are to be performed in a battery backed-up non-volatile random access memory (“NVRAM”). This memory of write/modify operations and corresponding data is arranged in an ordered log of operations called an “NVLOG.” Thus, if the filer experiences a catastrophic or power failure, upon recovery, the information in the NVLOG can be used to update the appropriate storage device(s), as though the cache had been flushed.
Volume mirroring is implemented by sending two streams of information from the source filer to the destination filer. The two streams correspond to the cache and to the NVRAM, respectively, on the source filer. The first stream is a stream of modified blocks that are to be written to the destination volume. The second stream is a log of write or modify operations that are to be performed on the destination volume. The first information stream contains much more data than the second information stream. Thus, the first information stream typically lags behind the second information stream.
The source filer directs the second stream of information to one of two log files on a storage device connected to the destination filer. The source filer alternates between the two log files, as the source filer alternates between the two halves of its cache. As with the NVRAM on the source filer, in case of a catastrophic or power failure involving the destination filer, the current log file can be used to update the destination volume, as though all the modified blocks had been written to the destination volume.
During a cache flush on the source filer, the source filer waits until the destination filer reports completion of the write operations involved in the two streams before the source filer considers its cache flush to be complete. This ensures that the destination volume is synchronized with the source volume.
However, if a volume is mirrored to a flexible volume, flushing the cache on a source filer can pose a problem. As noted, flexible volumes are implemented as container files. Therefore, write requests to flexible volumes are treated by the destination filer as ordinary file write operations to the container file, which are cached by the destination filer and reported to the source filer as being complete, before the data is actually written to the corresponding storage device(s). Because the destination filer treats both information streams from the source filer as ordinary file write requests, both streams are cached by the destination filer. The destination filer reports that the write operations involving the two information streams are complete as soon as the write operations are cached by the destination filer, i.e., before this information is actually written to the storage device(s).
Earlier, when the source filer began to flush its cache, the source filer switched to the other half of its cache and began writing to the other (second) log file on the destination filer. The next time the source filer flushes its cache, the source filer switches back to the first half of its cache and begins overwriting the first log file. The source filer assumes it is safe to reuse the first log file, because the source filer was informed that the write operations involving the two information streams were completed during the earlier cache flush. Thus, the source filer assumes the destination volume is synchronized with the source volume.
However, if the destination filer's cache is not flushed after the source filer's cache is flushed, the destination volume becomes unsynchronized with the source volume. That is, modified data blocks remain in the destination filer's cache, without having been written to the destination volume.
Furthermore, if the source filer flushes its cache a second time, the first log file begins to be overwritten. Thus, if the destination filer experiences a catastrophic or power failure, the log files do not contain enough information to update the destination volume.
To avoid this problem, a cache flush on the source filer is not deemed to be complete until the destination filer flushes its cache. However, this creates a dependency, i.e., the source filer's cache flush completion is dependent upon the destination filer's cache flush completion.
If the destination filer handles a source volume that is mirrored on the source filer, (i.e., each filer mirrors a volume that is sourced from the other filer (a situation known as “bi-directional mirroring”), completion of each filer's cache flush depends upon completion of the other filer's cache flush. This, of course, creates a deadlock, because, to flush its cache, each filer must wait until the other filer flushes its cache. Similarly, if filer A mirrors a volume on filer B, and filer B mirrors another volume on filer C, and filer C mirrors yet another volume on filer A, this “circular mirroring” creates a deadlock.