1. Field of the Invention
This description relates in general to distributed computing systems, and more particularly, to a method, system and computer program product for managing a point-in-time snap copy of a storage unit in a de-duplication environment of a distributed computing system.
2. Description of Related Art
One or more hosts may store large quantities of data in a group of storage units, which is typically controlled by a storage controller. Examples of such a storage controllers include the IBM TotalStorage® Enterprise Storage Server® (ESS) and the IBM System Storage DS8000 series. A storage controller such as the ESS or DS8000 may provide a number of functions accessible by the hosts for protecting data, backing the data up, and making the data available for use.
Amongst the functions which may be provided by a storage controller is a data preservation function which can preserve an identified set of data at a particular point in time. For example, data may be preserved in a “read/write copy” operation in which data is copied from one location to another location by reading the data from the one location and writing the data to the other location. The ESS and DS8000 series storage controllers support another data preservation function, a point-in-time snap copy function referred to as “FlashCopy”, which enables an instant copy to be made of a set of tracks in a source volume. One feature of such point-in-time snap copy functions is that the data of the copy is frequently made immediately available for read or write access. The identified data may be for example, a set of tracks which can consist of an entire volume, a data set, or just a selected set of tracks, for example.
In one mode of a point-in-time snap copy function, a copy of all of the data to be preserved at the particular point in time, is eventually made by read/write copying the identified data from the source volume to the target volume, typically in a background read/write copy mode. If a host attempts to read data from the target volume before it is read/write copied over to the target volume, the read operation is directed to the source volume containing the original data. If a host attempts to update the data on the source volume which is to be preserved on the target volume, that update is typically temporarily delayed until the old data to be updated is read/write copied to the target volume for preservation. Once a particular data location of the set of identified data on the source volume has been successfully read/write copied to the target volume by the background read/write copy operation, that data location on the source volume is freed for subsequent immediate updating by a host.
A storage controller typically has a memory, often referred to as a cache, to temporarily store data accessed from the storage units. Read and write operations are frequently performed more quickly in the cache memory as compared to read or write operations for data stored in the storage units. Thus, data is often cached in the cache in anticipation of a read operation from a host requesting that data. Similarly, write operations are frequently performed on the data in cache which subsequently “flushes” the new write data to the storage units for storage.
In order to shrink storage requirements and improve bandwidth efficiency, duplicate data may be eliminated by a deduplication engine which may be implemented in hardware, software or both. In the deduplication process, duplicate data is deleted, leaving only one copy of the data to be stored. This single copy is often referred to as the “Master” copy and the redundant copies which are deleted are often referred to as “secondary” copies. For the redundant or secondary copies which are deleted, a reference pointer which points to the master copy, is typically maintained.
The reference pointer is typically calculated by processing a set of data, often referred to as a “chunk” of data, using a hash function or other algorithm. If a set of data produces the same reference pointer value as a previously stored set of data, it is assumed that the two sets of data are copies of each other and only one copy of the two sets of data may be retained.
A storage controller frequently maintains a file system which includes a user component which manages files within directories, file path traversals, and user access to the files, for example. A storage component of the files system determines how a file is physically stored on a storage unit.
The file system often breaks up a file into smaller units, such as file blocks. Each file block may be mapped by the file system to a logical file unit such as a logical block which in turn is mapped to an actual physical file unit such as a physical block of data stored on a storage unit. The mapping of logical blocks to physical blocks facilitates separating file management from storage management.