1. Technical Field
The present invention relates generally to data storage. More particularly, the present invention relates to a system and method for providing access to replicated data.
2. Description of Related Art
Companies today rely to an unprecedented extent on online, frequently accessed, constantly changing data to run their businesses. Unplanned events that inhibit the availability of this data can seriously damage business operations. Additionally, any permanent data loss, from natural disaster or any other source, will likely have serious negative consequences for the continued viability of a business. Therefore, when disaster strikes, companies must be prepared to eliminate or minimize data loss, and recover quickly with useable data.
Replication is one technique utilized to minimize data loss and improve the availability of data in which a replicated copy of data is distributed and stored at one or more remote sites or nodes. In the event of a site migration, failure of one or more physical disks storing data or of a node or host data processing system associated with such a disk, the remote replicated data copy may be utilized, ensuring data integrity and availability. Replication is frequently coupled with other high-availability techniques such as clustering to provide an extremely robust data storage solution.
Replication may be performed by hardware or software and at various levels within an enterprise (e.g., database transaction, file system, or block-level access) to reproduce data from a replication source volume or disk within a primary node to a remote replication target volume or disk within a secondary node. Replication may be synchronous, where write operations are transmitted to and acknowledged by one or more secondary node(s) before completing at the application level of a primary node, or asynchronous, in which write operations are performed at a primary node and persistently queued for forwarding to each secondary node as network bandwidth allows.
One drawback associated with conventional data replication systems is that a replication target volume or disk may not typically be accessed or mounted reliably while replication is occurring. Consequently, replication operations must often be at least temporarily suspended when accessing replication target volumes or disks to prevent potentially severe errors from occurring. As a result, such volumes or disks are often underutilized or remain completely idle until site migration or device failure occurs rather than being used to perform useful “off-host processing” operations such as backup, data mining, monitoring, or the like.
Stopping replication abruptly to provide access to the replicated data not only creates a time window in which a consistent and up to date copy of data is unavailable but may also cause file system data inconsistencies because, for example, one or more file system data buffer(s) may not be flushed/updated at a replication target volume after a sudden break in the replication. Although it is anticipated that in most of the cases a file system can self-recover from inconsistencies due to a break in replication or can be manually recovered using some tools, such reliance on file system recoverability is undesirable. Moreover, such recovery would take considerable time and the changes would make a replication target volume an inexact replica of the replication source volume.
FIG. 1 illustrates a replication system block diagram providing access to a replicated target volume utilizing a read-only volume mount according to the prior art. Primary node 100a of the illustrated prior art embodiment includes an application 102a (e.g., a database, mail server, web server, etc.), a file system 104a, file system data buffer(s) 106a, a replication facility 108a, and a replication source volume 110a as shown. Replication facility 108a of primary node 100a receives data from application 102a and/or file system 104a, for example, in conjunction with the performance of a write operation, to be stored within replication source volume 110a. Replication facility 108a of primary node 100a then stores the received data within replication source volume 110a and transfers a replicated copy of the data at a block level to a corresponding replication facility 108b within secondary node 100b over a communication link 114 (e.g., an IP network, LAN or WAN) coupled between primary node 100a and secondary node 100b. Read access to data stored within replication source volume 110a by file system 104a or by application 102a via file system 104a is provided through a volume mount 112a local to primary node 100a as illustrated.
A given node can serve as a primary node/replication source volume for one application and as a secondary node/replication target volume for another application. Furthermore, for the same application program, a given node can serve as a secondary node at one point in time and as a primary node at another point in time to “cascade” replication of the data to other nodes connected via communications links. For example, a first replication may be made between nodes in different cities or states, and a node in one of the cities or states can in turn act as the primary node in replicating the data worldwide.
Each replication primary node may also have more than one replication secondary node. As used herein, a reference to the secondary node implicitly refers to all secondary nodes associated with a given primary node unless otherwise indicated as identical replication operations are typically performed on all secondary nodes.
Secondary node 100b, coupled to primary node 100a via communications link 114 includes an application 102b (e.g., a duplicate copy of application 102a or alternatively a distinct application such as a data mining, mirroring or backup application), a file system 104b, file system data buffer(s) 106b, replication facility 108b, and replication target volume 110b including a replicated copy of data stored on replication source volume 110a. Replication facility 108b receives a replicated copy of data from replication facility 108a over a communication link 114 and stores the replicated data copy within replication target volume 110b. Read access to data stored within replication target volume 110b is provided through a read-only volume mount 112b local to secondary node 100b as illustrated.
When a write operation is performed on replication source volume 110a within primary node 100a, file system data (e.g., volume metadata, pending delayed writes, etc.) associated with the write operation may be temporarily stored or “cached” within file system buffer(s) 106a prior to being stored within replication source volume 110a and replicated to replication target volume 110b. Consequently, the transfer of a replicated copy of data (e.g., application data) from replication source volume 110a to replication target volume 110b may precede the transfer of a replicated copy of associated file system data, resulting in inconsistencies between some of the data stored within replication target volume 110b and file system data stored therein or within file system data buffer(s) 106b. This may occur for example, if communications link 114, replication facilities 108a and/or 108b, and/or primary node 100a become unavailable prior to the replication of the described file system data. Such inconsistencies can cause undesirable system panics, exceptions, faults or other errors to occur within secondary node 100b and preclude reliable access to replication target volume 100b. 
FIG. 2 illustrates a replication system block diagram providing access to a replicated target volume utilizing a static point-in-time volume copy according to the prior art. Primary node 200a of the illustrated prior art embodiment includes an application 202a, a file system 204a, file system data buffer(s) 206a, a replication facility 208a, and a replication source volume 210a having read access via a volume mount 212a local to primary node 200a as described above with respect to FIG. 1. Primary node 200a is similarly coupled to secondary node 200b via communications link 214.
Secondary node 200b of the illustrated embodiment includes an application 202b (e.g., a duplicate copy of application 202a or alternatively a distinct application such as a data mining, mirroring or backup application), a file system 204b, file system data buffer(s) 206b, replication facility 208b, a replication target volume 210b including a replicated copy of data stored on replication source volume 210a, and a point in time copy of replication source volume 210a. Replication facility 208b receives a replicated copy of data from replication facility 208a over a communication link 214 and stores the replicated data copy within replication target volume 210b. When access to the replicated target volume is desired, a point in time copy 216 or “snapshot” of replication target volume 210b volume is created and accessed utilizing volume mount 212b local to secondary node 200b. 
While the described technique allows replication to continue on the replication target volume 210b as a “snapshot” copy of the replicated data is accessed, the “snapshot” is merely a static, point-in-time volume image which may differ greatly from the replication target volume 210b particularly in environments where write operations and replication operations occur frequently. Moreover, the creation of such a static point in time copy 216 typically requires that no inconsistencies exist between file system data stored within secondary node 200b (e.g., within replication target volume 210b and file system data buffer(s) 206b) and data stored within replication target volume 210b. With some file systems (e.g., Microsoft Windows® NT File System) this requires that the primary node's file system 204a be dismounted to ensure that all file system data caches/buffers are flushed and that the point in time copy 216 has become consistent. In almost all cases, dismounting the primary node file system 204a requires all open files on replication source volume 210a to be closed which is extremely undesirable as it typically involves stopping any applications utilizing replication source volume 210a and ensuring there are no pending updates (e.g., write operations) to be replicated.