The present application relates generally to an improved data processing apparatus and method and more specifically to mechanisms for improving performance of asynchronous replication in hierarchical storage management (HSM) integrated storage systems.
Hierarchical storage management (HSM) is a data storage technique that automatically moves data between high-cost and low-cost storage media. HSM is sometimes referred to as tiered storage. HSM systems exist because high-speed storage devices, such as hard disk drive arrays, are more expensive (per byte stored) than slower devices, such as optical discs and magnetic tape drives. While it would be ideal to have all data available on high-speed devices all the time, this is prohibitively expensive for many organizations. Instead, HSM systems store the bulk of the enterprise's data on slower devices, and then copy data to faster disk drives when needed. In effect, HSM turns the fast disk drives into caches for the slower mass storage devices. The HSM system monitors the way data is used and makes best guesses as to which data can safely be moved to slower devices and which data should stay on the fast devices.
In a typical HSM scenario, data files that are frequently used are stored on hard disk drives (HDDs), or in some cases solid state drives (SSDs), but are eventually migrated to tape if they are not used for a certain period of time, such as a few months. If a user does reuse a file that is on tape, it is automatically moved back to disk storage. The advantage is that the total amount of stored data can be much larger than the capacity of the disk storage available, but since only rarely used files are on tape, most users will usually not notice any slowdown.
A basic method of storage replication is disk mirroring, typical for locally-connected disks. A replication is extendable across a computer network, so the disks can be located in physically distant locations, and a master-slave database replication model is usually applied. A purpose of replication is to prevent damage from failures or disasters that may occur in one location, or in case such events do occur, improve the ability to recover. For replication, latency is a key factor because latency determines either how far apart the sites can be or the type of replication that can be employed.
Synchronous replication guarantees “zero data toss” by the means of atomic write operations, i.e., a write either completes on both sides or not at all. A write is not considered complete until completion of the write operation is acknowledged by both the primary storage and the remote storage. Most applications wait for a write transaction to complete before proceeding with further work; therefore, overall performance decreases considerably. Inherently, performance drops proportionally to distance.
In asynchronous replication, a write is considered complete as soon as the primary storage acknowledges completion. Remote storage is updated, but probably with a small tag. Performance is greatly increased relative to synchronous replication, but in case of losing a local storage, the remote storage is not guaranteed to have the current copy of data and most recent data may be lost.