1. Technical Field
The present invention relates to the field of object replication, and more particularly, to a redundancy technique for object replication by grouping objects based on a probability score thereby minimizing re-transmission of objects.
2. Description of the Related Art
Remote mirroring is a data redundancy technique for coping with storage system failures. A copy of data, sometimes referred to as a ‘primary’ or ‘local’ copy, is updated, for example, as it is accessed by an application program. A redundant copy of the data, sometimes referred to as a ‘secondary’ or ‘slave’ copy of the data, usually at a remote site, is updated as well. When a failure occurs that renders the primary copy unusable or inaccessible, the data can be restored from the secondary copy, or accessed directly from there.
Conventional schemes for remote mirroring tend to maintain the primary and secondary copies of the data synchronized. Thus, when a failure occurs at the primary site, data loss is minimized because the secondary copy matches the data that was stored at the primary site. However, when an error occurs that results in data corruption at the primary site, such as a software error, these schemes tend to quickly propagate the error. This results in corrupted data at the secondary site.
U.S. Pat. No. 7,120,825 describes a technique for adaptive batching for asynchronous data redundancy. A sequence of write transactions are adaptively arranged into a sequence of send batches at the first storage facility. The transactions are received at a second storage facility and applied to a redundant data copy at the second storage facility. The second storage facility may arrange the write transactions according to a sequence of receive batches. The batch sizes may be adaptively adjusted or completed. The batch sizes or adaptive completion of the batches may be based on, for example, availability of a communication medium between the first storage facility and the second storage facility. Each send batch may be forwarded to the second storage facility upon completion.
Replication of data-objects from a source location to a destination location over any communication network/protocol has a number of constraints, some of which are indicated. First, the objects are active, in the sense that the objects are being dynamically updated; for e.g. active files in a file system. Second, at any point-in-time image of the objects is not possible. Third, one or more object(s) are grouped together and these groups are the logical units of work. Fourth, the complete operation at the group level is atomic. Typically, such an operation involves i.) Reading from source location. ii.) Transfer of data over the network to destination location iii.) Storage of data on the destination location. And, any change/update in the characteristics of any of the component-objects of a group would mean a complete restart of the whole of the group-operation. Several other constraints may be applied during the replication process. The prior art suffers from a disadvantage with retransmission of whole group of objects in case of any change/update of any component objects within a group of objects. A further disadvantage is that the replication would interfere either with the filesystem or require its operations to be shut down.
Without a way to provide an improved method of replicating objects on a communication network and reducing failure of operations during replication the promise of this technology may never be fully achieved.