In cloud computing and other computing environments, access to storage devices is often provided by one or more storage controllers operating out of storage servers, computing devices, or nodes within the computing environment. Each of the storage controllers may be configured to operate in a client-server model that allows client devices or applications to send storage requests to the storage controllers. In this model the clients are typically coupled to the storage controllers over a computer network, such as a network link, a local area network (LAN), or a wide-area network. The clients typically send the storage requests to the storage controllers using application programming interface (API) calls, remote procedure calls, web services, and/or the like.
When a storage controller receives a storage request, it examines the storage request to identify a file, a block, or an extent that is to be accessed. These files, blocks, and extents are typically located in one or more storage devices coupled either directly to the storage controller or indirectly using a network. The storage devices may include disk drives, flash memories, storage arrays, and/or the like. The storage devices are typically organized using file systems composed of one or more volumes. The volumes, in turn, may be further organized into one or more aggregates or storage objects that may each be managed by the storage controller as a single logical group. Each of the storage objects may be assigned a storage unit identifier that may be used within storage requests to identify a desired storage object.
The storage servers and their storage controllers may be networked or otherwise connected together as a storage system. The presence of multiple storage servers and storage controllers within the storage system may provide for several advantages. For example, use of multiple storage servers and storage controllers may allow for flexibility in handling the load due to storage requests. Whenever one of the storage servers becomes busy, it may be possible to use one of the other storage servers in the storage system to handle some of the storage requests. As another example, the multiple storage servers and storage controllers provide redundancy in the storage system. Whenever one of the storage servers or storage controllers is unavailable, either due to failure or from being offline due to maintenance, the other storage servers and storage controllers may be able to handle storage requests that would otherwise be handled by the unavailable storage server or storage controller.
In order to support this load balancing and/or redundancy among the storage servers and storage controllers, the storage servers and storage controllers coordinate the management and handling of the storage objects they provide access to. This may include the storage servers and storage controllers managing the “ownership” of the storage objects, with the storage server and storage controller that have “ownership” of the storage object being responsible for handling the storage requests associated with the storage object. Thus, in order for handling of storage requests to be transferred from a first (source) storage server and storage controller to a second (target) storage server and storage controller, the “ownership” of the storage object is migrated or transferred to the target storage server and storage controller using a takeover operation.
The migration or takeover operation may be accomplished by copying the data from a source storage object owned by the source storage server and storage controller to a target storage object owned by the target storage server and storage controller. Using copy operations to transfer ownership may have several disadvantages. First, copy operations are typically time intensive, especially when large quantities of data are involved. Second, the copy operations may also consume resources such as network bandwidth and resources of the storage devices involved.
A better migration solution involves “zero-copy” migration. Zero-copy migration may be used in a distributed architecture where each of the storage servers and storage controllers have access to a shared pool of storage devices containing the storage objects being managed by the storage system. By sharing the storage devices between the storage servers and storage controllers, each of the storage servers and storage controllers may have its own access to each of the storage objects, no matter which of the storage devices is storing the storage object. In this scenario, migration of ownership from the source storage server and storage controller to the target storage server and storage controller involves changing the ownership from the source storage server and storage controller to the target storage server and storage controller. Once the ownership is changed, storage requests for the migrated storage object are then directed to the target storage server and storage controller for handling.
One possible arrangement for supporting zero-copy migration uses two storage servers organized as a high-availability (HA) pair. In a HA pair, the two storage servers and their respective storage controllers are coupled together using a network with a management or control layer and are both coupled to the storage devices where the storage objects are stored. As both of the storage servers and storage controllers operate, they both monitor the status of the other and exchange status and other management messages. The ownership of the storage object is changed along with the responsibility for handling storage requests associated with that storage object whenever a storage object is migrated from one of the storage servers and storage controllers to the other storage server and storage controller using a takeover operation. A migration, for example, may occur as a result of an unscheduled event, such as whenever one of the storage servers or storage controllers fails, or in a planned fashion, such as when one of the storage servers or storage controllers is taken offline during maintenance and/or update.
During a takeover operation, each of the storage objects whose ownership is being migrated becomes temporarily unavailable. The period of unavailability helps to avoid uncertainty in the handling of storage requests that may occur when the ownership of the corresponding storage object is being migrated. In practice, the unavailability often begins when the storage object is taken offline by the source storage server and storage controller. Ownership of the storage object is then changed to the target storage server and storage controller, after which the target storage server and storage controller bring the storage object back online. While a storage object is unavailable, storage requests made to that storage object are not handled and the storage system may refuse to accept them when they are made by a client.
In the situation where the source storage server or storage controller becomes unavailable due to an unplanned event, such as a device failure, the storage objects owned by the source storage server and storage controller typically become concurrently offline. This is because with the unavailability of the source storage server and storage controller there is no storage server or storage controller designated to handle the storage requests for those storage objects. These storage objects remain offline until the unavailability of the source storage server and source storage controller is detected, a target storage server and storage controller are selected, the ownership of the storage objects are migrated to the target storage server and storage controller, and the target storage server and storage controller bring the storage objects back online. This may result in a significant period of delay, even after the unavailability is detected. In some cases the delay caused by the concurrent migration of several storage objects may be up to a minute or longer.
In the situation where the source storage server or storage controller becomes unavailable due to a planned or negotiated event, such as for a scheduled upgrade or other maintenance, the detection time may be eliminated. The storage objects are also migrated from the source storage server and storage controller to the target storage server and storage controller. As in the unplanned case, the storage objects may be migrated all at the same time, but it would be helpful if the period of unavailability could be better managed.
Accordingly, it would be desirable to provide methods and systems for effectively and efficiently changing the ownership of storage objects.
In the figures, elements having the same designations have the same or similar functions.