The present invention relates generally to garbage collection in a data storage system, and more particularly to garbage collection in the context of backups such as mirroring and taking point-in-time copies.
Garbage collection is used in data storage systems as a background memory management function which cleans up a physical storage medium by making contiguous blocks of address space available for future write operations. This is typically done by deleting no-longer needed data and by grouping smaller blocks of still-needed data into larger contiguous blocks of address space to defragment use of the storage medium in a process called coalescing or compaction.
Within storage controllers, it is known to provide a replication function which backs up local data in a non-disruptive way to another set of local storage devices by using mirroring or point-in-time copies. Another form of replication is to backup the data to a remote site.
Terminology in the art refers to a primary site and a secondary site for data storage, where the primary site is where the original or master copy is located and the secondary site is where the backup copy is located. Terminology in the art also refers to a source volume and a target volume, where data is transferred from the source to the target when performing a backup or mirroring operation. The term destination volume is a synonym for target volume.
For example, storage controllers with a replication function are the IBM SAN Volume Controller or Storage RAID arrays such as the IBM Storwize® products. For example, the mirroring or point-in-time copy technology is IBM FlashCopy® and IBM Global Mirror with Change Volumes (GMCV). For example, the remote site data backup technology is IBM HyperSwap®, Global Mirror® or Metro Mirror®. IBM® is a reference to International Business Machines Corporation of Armonk, N.Y.
FlashCopy® implements a bitmap to track differences between the source and target volumes that are related by FlashCopy®. The bitmap records per address space unit, referred to as a grain, whether the data stored on the source is also stored on the target. That is there is one bit in the bitmap for each grain. At an initial point in time, after full replication has taken place, all bits in the bitmap are unset, or “not split”, indicating that the two volumes are identical copies of each other. As the storage system evolves, certain grains in the source may diverge from those of the target, e.g., as a result of a host write to one or the other volume, and the bits for those grains are set in the bitmap. Those grains are said to be “split”. Reads and writes to the storage system can then refer to the bitmap to determine whether to read from the source or target volume, or in the case of a write whether a grain update in respect of an unsplit grain needs to be performed before the write can take place. Two types of bitmap are maintained. There is the bitmap just discussed which relates to the split, this split bitmap being a bitmap of grains that have been copied already to the target. There is additionally a bitmap relating to the differences or increments, which is called the difference bitmap or incremental bitmap. This is the bitmap of grains that have changed on the source since the initial trigger, so in subsequent re-triggers then the copy process only needs to copy these grains instead of running a full copy again. It is additionally noted that, as an alternative to “grain” terminology, we may refer to the units of address space as “address blocks” or just “blocks”.
GMCV uses FlashCopy® internally to guarantee the consistent copy, but offers a tunable recovery point objective (RPO), called a cycling period. With GMCV, a FlashCopy® mapping, called a change volume, exists on both the source and target. When replication begins, all data is sent from the source to the target, and then changes are tracked on the source change volume. At the end of each cycle period, the changes accumulated in the source change volume are sent to the target change volume, which then stores that set of data as a consistent copy.
Replication may be taking place on storage volumes that use thin provisioning. Thin provisioning is a virtualization scheme whereby a volume appears the size a user would like it to be from an applications perspective, but in which the amount of physical storage used at the back end is only sufficient to store the data actually contained on the volume from the host. Blocks of data on the physical storage medium are allocated as needed, rather than in advance during formatting. This optimizes resources used and allows the unutilized storage to be used for other purposes. Thin provisioning may be used on either the source data or the copy of the data, or both.
De-duplication is another technology that is becoming increasingly common. De-duplication allows the system to identify as data is written whether the same data is already present elsewhere, and, instead of storing a new copy of the data, adds a reference to the source data instead of processing the write. The benefit is greatly reduced cost and utilization, and the user can use the saved space for other purposes.
Compression is another technology often used also to reduce storage requirements in which data stored in an original representation is converted into another representation which requires less memory. Compression can be used independently of, i.e., alongside, thin provisioning and de-duplication.
The data stored at the local and remote sites may use any of these technologies. Commonly, for fast access, the primary site may choose not to use these technologies due to the additional overhead of maintaining metadata to manage the compressed and/or de-duplicated data.
With compression, over-writes to previously written data are often written elsewhere, since at the time of the write, the compressed user data may have changed in size and therefore the controller doesn't know the size of the old data. This requires garbage collection technologies to reclaim the space occupied by previous versions of the data. Additionally, if the region of the physical storage is fragmented, then the garbage collection also needs to coalesce (or compact) the current data to another location to allow larger areas of the physical space to become free, thereby minimizing fragmentation of the backend physical space.
Often de-duplication operates at a much wider level than at a user volume level. Often it is a system wide, or storage pool level which means the metadata and algorithms that are used have to operate at a wider level and many user volumes need to be included within the scope of the de-duplication. For de-duplication, garbage collection is required for different reasons. Depending on the implementation, multiple user volumes may be referencing the same piece of physical data on the backend. If an over-write occurs on the source of the user data, the new write has to be written elsewhere. Additionally, the controller often choses to implement de-duplication together with thin-provisioning, coalescing smaller chunks of sparsely populated user data into larger chunks of data, therefore over time fragmentation will mean that garbage collection needs to gather together smaller chunks into larger chunks which each need a commensurately large chunk of free space for their storage. Since de-duplication occurs across multiple user volumes, the Garbage Collection also has to operate at the same level (such as storage pools) to be effective. Storage pools often maintain slack-space in case there is a sudden workload of new write data, since delaying user I/O while waiting for garbage collection to free up space for the new write is undesirable. Garbage collection operations thus have the task of compacting data from small writes into larger chunks, thereby to free-up larger contiguous blocks of physical storage.
I/O may be communication from host site to remote site (or source to target volume) that is connected with updating writes and other activities including garbage collection. I/O also includes communication between a disk controller and the disk, e.g., the communications between the disk controller of the target volume and the target volume to carry out garbage collection or updating writes.
An I/O can be a read operation or a write operation. The operation can be, for example, from the host to the storage controller, or from the storage controller to the backend drive. In the case of a read operation/request from host to the controller, data is transferred from the storage controller back to the host. In the case of a write operation/request, data is transferred from the host to the storage controller, since the host is requesting that it wants to send data and store it on the storage controller. A user I/O is an application operation and a host I/O is a host operation. Other I/O types may originate from the storage controller itself, such as a garbage collection request. A cleaning I/O is a FlashCopy® term applicable to IBM SAN volume controllers. Cleaning is a process of making the target copy independent of the source volume which is effected by copying the dependent data from the source to the target. This involves reading the dependent data from the source and writing it to the target. Therefore, these I/O operations are generated internal to the storage controller.
User volumes have a forward lookup tree that maps the user volume to physical storage. Data replication of a volume operates at a user volume level, where the data is stored within the same storage pool. A storage pool encompasses many user volumes over which a joint garbage collection operation is being performed. In order for a garbage collection algorithm to work efficiently it is preferable to scan the storage at a physical level, rather than at a user's virtualized volume level. This means that garbage collection works from the other end compared with the user. A reverse lookup algorithm is therefore needed to translate the physical data movements caused by the garbage collection algorithm into the virtualized space that the user forward lookup mechanism is referring to. Garbage collection operations thus also have this task of manipulating the forward lookup tree.
Generally, a storage system has to balance the user I/O workload with garbage collection scheduling rates to avoid overloading the physical storage, otherwise the performance of the user I/O will degrade.
When a volume copy is triggered using either of the replication technologies (mirroring or point-in-time copying), the target volume is likely to receive a bunch of over-writes for the previously written data to those areas on the target.