1. Field of the Invention
The present invention relates generally to computer storage systems, and more particularly to remote mirroring in distributed computer storage systems.
2. Description of the Background
In a common computer system architecture, a host computer is coupled to a network that includes storage devices which provide non-volatile storage for the host computer. This is typically known as a computer storage system. The computer storage system includes, among other things, a number of interconnected storage units, each storage unit includes a number of physical or logical storage media (for example, a disk array). For convenience, a group of one or more physical disks that are logically connected to form a single virtual disk is referred to hereinafter as a xe2x80x9cLogical Unitxe2x80x9d (LU). Data from the host computer is stored in the computer storage system, and specifically in the various storage units within the computer storage system.
One problem in a computer storage system is data loss or unavailability, for example, caused by maintenance, repair, or outright failure of one or more, storage units. In order to prevent such data loss or unavailability, a copy of the host data is often stored in multiple storage units that are operated at physically separate storage units. For convenience, the practice of storing multiple copies of the host data in physically separate storage units is referred to as xe2x80x9cremote mirroring.xe2x80x9d Remote mirroring permits the host data to be readily retrieved from one of the storage units when the host data at another storage unit is unavailable or destroyed.
Therefore, in order to reduce the possibility of data loss or unavailability in a computer storage system, a xe2x80x9cremote mirrorxe2x80x9d (or simply a xe2x80x9cmirrorxe2x80x9d) is established to manage multiple images. Each image consists of one or more LUs, which are referred to hereinafter collectively as a xe2x80x9cLU Array Set.xe2x80x9d It should be noted that the computer storage system may maintain multiple mirrors simultaneously, where each mirror manages a different set of images.
Within a particular mirror, one image on one storage system is designated as a primary image, while each other image on one storage system within the mirror is designated as a secondary image. For convenience, the storage unit that maintains the primary image is referred to hereinafter as the xe2x80x9cprimary storage unit,xe2x80x9d while a storage unit that maintains a secondary image is referred to hereinafter as a xe2x80x9csecondary storage unit.xe2x80x9d It should be noted that a storage unit that supports multiple mirrors may operate as the primary storage unit for one mirror and the secondary storage unit for another mirror.
A mirror must provide data availability such that the host data can be readily retrieved from one of the secondary storage units when the host data at the primary storage unit is unavailable or destroyed. In order to do so, it is imperative that all of the secondary images be synchronized with the primary image such that all of the secondary images contain the same information as the primary image. Synchronization of the secondary images is coordinated by the primary storage unit.
Under normal operating conditions, the host, i.e., a server running an operating system and an assortment of programs, writes host data to the primary storage unit. The primary storage unit stores the host data in the primary image and also coordinates all data storage operations for writing a copy of the host data to each secondary storage unit in the mirror and verifying that each secondary storage unit receives and stores the host data in its secondary image.
Today data storage operations for writing the copy of the host data to each secondary storage unit in the mirror can be handled in either a synchronous manner or an asynchronous manner. In conventional synchronous remote mirroring, the primary storage unit ensures that the host data has been successfully written to all secondary storage units in the mirror before sending an acknowledgment to the host, which results in relatively high latency, but ensures that all secondary storage units are updated before informing the host that the write operation is complete. In asynchronous remote mirroring, the primary storage unit sends an acknowledgment message to the host before ensuring that the host data has been successfully written to all secondary storage units in the mirror, which results in relatively low latency, but does not ensure that all secondary storage units are updated before informing the host that the write operation is complete.
In both synchronous and asynchronous remote mirroring, it is possible for a number of failures to occur between receiving a write request from the host and updating the primary image and all of the secondary images. One such failure may involve writing to the primary storage unit, but being unable to write to the secondary storage unit due to an actual hardware or software failure between the primary storage unit and the secondary storage unit. Another possible cause of an inability to write is a failure of the secondary storage unit. If the primary storage unit was in the process of completing one or more write operations at the time of the failure, the primary storage unit may have updated the primary image, but may not have updated any secondary image.
After the failure, it may not be possible for the primary storage unit to determine the status of each secondary image, and specifically whether a particular secondary image matches the primary image. Therefore, the primary storage unit will resynchronize all of the secondary images by copying the primary image block-by-block to each of the secondary storage units.
Unfortunately, copying the entire primary image to all the secondary storage units can take a significant amount of time depending on the image size, the number of secondary storage units, and other factors. It is not uncommon for such a resynchronization to take hours to complete, especially for very large images.
Thus, there is a need for a system and method for quickly resynchronizing primary and secondary images following a failure.
In one aspect there is provided a method for synchronizing a plurality of data images in a computer system. The plurality of data images include a primary image and at least one secondary image. In accordance with the method, a write request is received from a host computer at a primary image site. A write operation is conducted on the primary image at the primary image site, and attempted on at least one secondary image at at least one secondary image site. If the attempt to write to the at least one secondary image at the at least one secondary image site fails, a fracture log is created at the primary image site, which is representative of changed regions in the primary image at the primary image site, whereby the log can be used to synchronize the primary image and the secondary image once it becomes possible to write to the at least one secondary image.
In a more specific aspect, the fracture log which is maintained only in the event of a failure, is a bitmap of the changed regions that have been affected on at least one LU as a result of the write request. In a yet still more specific aspect, the primary image at the primary image site is updated at the same time that the at least one secondary image is updated at the at least one secondary image site in response to the write request. After the updates are made, specifically in the case of synchronous mirrors, the primary image site communicates to the host that the update to both sites is complete. Yet more specifically, if the write request to the at least one secondary image site fails, the fracture log representative of changed regions is created at the primary image site which is representative of changed regions at the image at the primary image site, and is used to effect writing to the at least one secondary image at the at least one secondary image site when it becomes possible to write to the at least one secondary image, thereby ensuring that the images at the primary image site and the at least one secondary image site are synchronized.
In a yet more specific aspect, sometimes it is possible that the write request may have failed at the primary image site and thus at the secondary image site. In such case, a write intent log which is a bitmap representative of the blocks affected by the write request at the primary storage unit, is created at the primary image site. The write intent log is used to write the blocks identified at the primary image to the secondary image when recovery occurs. Thus, it is possible that the original write did or did not occur at the primary image. The write intent log identifies those blocks so that only those blocks are copied to the secondary image to ensure synchronization, irrespective of whether or not those blocks at the primary image were changed as a result of the original write request. The fracture log is then created at the primary image site when the write is effectuated if there is an additional failure to write to the secondary image.
In another aspect, there is disclosed a computer system for maintaining a plurality of data images therein. The plurality of data images include a primary image and at least one secondary image. The computer system includes non-volatile storage for storing at least the primary image. A network interface serves to access the at least one secondary image. There is logic for creating a fracture log which identifies changed regions in the primary image effected as the result of a write to the primary image, and for creating the fracture log only if a write request to the primary image and the secondary image fails with respect to the secondary image. There is also included a write logic for writing to the primary image and to the at least one secondary image to maintain the primary image and the at least one secondary image synchronized, and for writing to the at least one secondary image based on the contents of the fracture log upon the failure of a write request to the at least one secondary image.
In a more specific alternative aspect, the fracture log is made up of a bitmap of the changed regions that have been effected on at least one disk containing the primary image. The fracture log is maintained at the primary image site in which the primary image is maintained, and the logic is configured for updating the primary image at the primary image site and the at least one secondary image at the at least one secondary image site, and for communicating to a host issuing the write request at the update to the primary image at the primary image site, and the at least one secondary image at the at least one secondary image site is complete, specifically in the case of synchronous mirrors.
Yet more specifically, the write logic is configured for using the fracture log in the event of a failure of a write request to the at least one secondary image, to write the same changes to the at least one secondary image upon the ability to write being restored, as previously written to the primary image, to ensure synchronization between the primary image and the at least one secondary image.
Yet still further, the system includes a write intent log in the primary image for maintaining a bitmap indicative of regions on the primary image possibly affected as a result of write requests in the event of a failure to write. The write logic is further configured for writing the blocks on the primary image identified by the write intent log to the secondary image.