Disaster recovery strategies for computer systems generally involve copying data stored at a primary site to a secondary site which is typically located some distance from the primary site. Copying between the primary and secondary copies may be performed either synchronously or asynchronously. Where copying is performed synchronously, each time an update is written to the primary copy, the update is also sent to the secondary site to be written to the secondary copy. Only after the secondary site informs the primary site that the secondary copy has been updated does the primary site acknowledge the update to the primary copy and stand ready to write the next update. Thus, updates are written to the primary and secondary copies in the same order. Where copying is performed asynchronously, multiple updates may be written to the primary copy and acknowledged before any updates are sent to the secondary site, as the primary copy is maintained independently from the secondary copy. The updates are sent periodically to the secondary site, typically as a set of writes referred to herein as a “color,” and are written to the secondary copy, not necessarily in the same order as they were written to the primary copy.
Where a single color is maintained across multiple storage controllers at the primary site, it is necessary when switching to the next color that all storage controllers switch to the next color in a coordinated fashion to maintain the consistency of “dependent writes” across color boundaries. For example, given the following typical sequence of dependent writes for a data base update transaction:
1. execute a write to update the data base log indicating that a data base update is about to take place, then
2. execute a second write to update the data base, and finally
3. execute a third write to update the data base log indicating that the data base update has completed successfully,
it is imperative that these dependent writes either all belong to the same color, or, if they cross a color boundary, that the earlier write(s) belong to the old color and the later write(s) belong to the new color. In this example, assuming writes 1, 2, and 3 are each written by a different storage controller, if writes 1 and 3 are written as part of color group “red,” and write 2 is written as part of the next color group “blue,” should the primary copy be lost after the “red” group is written to the secondary copy but before the “blue” group is written to the secondary copy, the data base log in the secondary copy would incorrectly show that the second write to update the data base occurred, when in fact the data base was never updated.
In one technique for maintaining colors and color boundaries across multiple storage controllers at the primary site, before associating a write with a color, each storage controller polls a color control node which maintains the current color. The color control node apprises the storage controller of the current color, and the storage controller performs the write as part of that color.
In another technique for maintaining colors and color boundaries across multiple storage controllers at the primary site, a color control node maintains the current color, but the storage controllers do not poll the color control node for the current color. Rather, when the color control node wishes to form a new color, it sends a “freeze” command to all the storage controllers indicating the new color. When a storage controller receives the “freeze” command it withholds the acknowledgement of write operations from the requestors. The storage controller then sends and acknowledgement of the “freeze” command to the color control node. Once the color control node receives an acknowledgement from all the storage controllers, it sends a “thaw” command to all the storage controllers. When a storage controller receives the “thaw” command, it may acknowledge write operations to their requestors, and all write operations for which acknowledgements are sent after the “freeze” command is received are considered to have been written as part of the new color.
While it would be advantageous for a storage system to employ both polling controllers and freeze/thaw controllers, such as in support of system scaling or migration, a method for maintaining colors and color boundaries across multiple heterogeneous storage controllers at the primary site is required.