The invention is directed to methods and systems for determining and validating accessibility and currency, i.e., the actual status, of data replicated in a networked environment.
Data replication is a technique commonly used for achieving high data availability. When multiple replicas of a data set are created and stored at different locations, a replica of the data will more likely be available to a client application even when some components fail or some data sets are corrupted.
In computing systems many techniques exist for copying data and for managing multiple replicas. Replication techniques can be classified to two main categories: synchronous and asynchronous replication. Synchronous replication processes enforce continuous full synchronization between the source data set and the replicas. This involves strong transactional guarantees and ensures that any update to a source data set is consistently and immediately reflected in all the synchronous replicas. However, achieving synchronous replication can in some environments be prohibitively expensive in terms of the overhead it imposes on computing resources, and in some cases not be possible at all (for example due to temporary failure of some component in the environment).
Asynchronous replication, on the other hand, requires a much less stringent time-consistency between replicas by creating copies only periodically. Thus a replica may represent some past state of the data source rather than the current state of the data source. Depending on how far back in the past that reference point is, such discrepancy may still be acceptable for some client applications under some exceptional circumstances (e.g., when recovering from a catastrophic failure). Asynchronous replication imposes a much lower overhead on the computing resources and is commonly used in many environments, such as maintaining geographically remote copies of application data for Disaster-Recovery (DR) purposes.
However, ensuring continuous conformance of the data sets and their replicas with the applications requirements is a difficult challenge for a number of reasons: different applications may have different minimal currency requirements for replicated-data (that is, there are typically differences in their cost/currency trade-off considerations); there may be multiple data-copiers in a typical environment that may be executing concurrently; copy activities may be based on replicas (which may themselves not be fully current) rather than on the original data set, thus creating chains of dependencies; individual copy activities may fail entirely, and a replica at a remote site may be inaccessible to a host due for example to a network or component configuration problem.
Consequently an application may not have a replica of sufficient currency accessible to it at a remote site, if required. Currently, such a deficiency may not be detected until an application actually requires that replica. Present replication technologies focus on the actual correctness of individual copy mechanism, but not on continuous end-to-end validation of the currency and accessibility of multiple replicas of data.
It would therefore be desirable to provide systems and processes for continuously validating replicated data sets in networks as being in conformance with defined application requirements for currency and accessibility, and for identifying and notifying a user of any discrepancies so that corrective actions can be taken before any undesirable consequences.