This invention relates to the field of storage systems. In particular, the invention relates to predictive point-in-time copy for storage systems.
Point-in-time copy is used in storage systems to create a second copy of data to be used independently. This may also be referred to as a snapshot copy. Point-in-time copy may be generated by copy-on-write or redirect-on-write methods.
In storage systems that have a point-in-time copy feature using copy-on-write or redirect-on-write, there will be a means of tracking the location, typically a bitmap with a granularity such that one bit represents more than one physical storage allocation block.
Some costly processing is generally required on the first write to each region represented by a bit, with each processing action performed serially reading the region.
For copy-on-write solutions, the processing actions include: writing the read data to the copy target volume, merging the data written to the storage system with a second copy of the read data, and writing the merged data to the copy source volume.
For redirect-on-write solutions, the processing actions include: merging the read data with the data written to the storage system, and writing the merged data to the copy redirection volume.
In both cases, the write operation is then completed by writing the merged data to the host system that submitted it to the storage system.
These actions are slow, and significantly increase the load on the physical storage. A cache between the host system submitting the writes and the point-in-time copy feature can minimise the increase of write latency visible to the host system, but the increase in load on the physical storage will still be present.
This will typically halve the number of operations per second that the physical storage would normally support until a significant proportion of the regions have been written to. This in turn implies that in an environment where point-in-time copies are regularly triggered for a given volume, the user would have to specify double the amount of storage devices they would otherwise specify to provide the required performance. The number of operations supported in a RAID volume or aggregation of RAID volumes is typically proportional to the number of hard disk devices that the data is spread over.
Therefore, there is a need in the art to address the aforementioned problems.