Disaster recovery strategies for computer systems generally involve copying data stored at a primary site to a secondary site which is typically located some distance from the primary site. Copying from the primary copy to the secondary copy may be performed either synchronously or asynchronously. Where copying is performed synchronously, each time an update is written to the primary copy, the update is also sent to the secondary site to be written to the secondary copy. Only after the secondary site informs the primary site that the secondary copy has been updated does the primary site acknowledge the update to the primary copy and stand ready to write the next update. Thus, updates are written to the primary and secondary copies in the same order. Where copying is performed asynchronously, multiple updates may be written to the primary copy and acknowledged before any updates are sent to the secondary site, as the primary copy is maintained independently from the secondary copy. The updates are sent periodically to the secondary site, typically as a set of writes referred to herein as a “color,” and are written to the secondary copy, not necessarily in the same order as they were written to the primary copy.
Where a single color is maintained across multiple storage controllers at the primary site, it is necessary when switching to the next color that all storage controllers switch to the next color in a coordinated fashion to maintain the integrity of “dependent writes” across color boundaries. For example, given the following typical sequence of dependent writes for a data base update transaction:
1. execute a write to update the data base log indicating that a data base update is about to take place, then
2. execute a second write to update the data base, and finally
3. execute a third write to update the data base log indicating that the data base update has completed successfully.
It is imperative that these dependent writes either all belong to the same color, or, if they cross a color boundary, that the earlier write(s) belong to the old color and the later write(s) belong to the new color. In this example, assuming writes 1, 2, and 3 are each written by a different storage controller, if writes 1 and 3 are written as part of color group “red,” and write 2 is written as part of the next color group “blue,” should the primary copy be lost after the “red” group is written to the secondary copy but before the “blue” group is written to the secondary copy, the data base log in the secondary copy would incorrectly show that the second write to update the data base occurred, when in fact the data base was never updated.
In one technique for maintaining colors and color boundaries across multiple storage controllers at the primary site, before associating a write with a color, each storage controller polls a color control node which maintains the current color and requests the current color. The color control node apprises the storage controller of the current color, and the storage controller associates the write with that color. While this ensures the absolute color switchover of all storage controllers at effectively the same point in time and thereby ensures that the integrity of dependent writes is maintained across the color boundary, each write operation is delayed by the round trip to the color control node, and the color control node might become a bottleneck.
A method for maintaining colors and color boundaries across multiple storage controllers at the primary site that reduces write delay and the risk of bottleneck would therefore be advantageous.