1. Technical Field
The present invention relates generally to an improved data processing system and in particular to a data storage subsystem for use with a data processing system. Still more particularly, the present invention provides a method and apparatus for managing snapshot copy operations in a data processing system.
2. Description of Related Art
In computer systems and data storage subsystems, one problem is performing a data file copy operation in a manner that minimizes the use of processing resources and data storage memory. Previously, data files were copied in their entirety by the processor, such that two exact copies of the selected data file were resident in the data storage memory. This operation consumed twice the amount of memory for the storage of two identical copies of the data file. Additionally, this operation required the intervention of the processor to effect the copy of the original data file.
A data file snapshot copy is an improvement over this type of copy process. This snapshot copy process includes a dynamically mapped virtual data storage subsystem. This subsystem stores data files received from a processor in back-end data storage devices by mapping the processor assigned data file identifier to a logical address that identifies the physical storage location of the data. This dynamically mapped virtual data storage subsystem performs a copy of a data file by creating a duplicate data file pointer to a data file identifier in a mapping table to reference the original data file. In this dynamically mapped virtual data storage subsystem, the data files are referred to as a collection of xe2x80x9cvirtual tracksxe2x80x9d and each data file is identified by unique virtual track addresses (VTAs). The use of a mapping table provides the opportunity to replace the process of copying the entirety of a data file in the data storage devices with a process that manipulates the contents of the mapping table. A data file appears to have been copied if the name used to identify the original data file and the name used to identify the copy data file are both mapped to the same physical data storage location.
This mechanism enables the processor to access the data file via two virtual track addresses while only a single physical copy of the data file resides on the back-end data storage devices in the data storage subsystem. This process minimizes the time required to execute the copy operation and the amount of memory used since the copy operation is carried out by creating a new pointer to the original data file and does not require any copying of the data file itself.
One implementation of the snapshop copy process provides a two-table approach. One table has table entries for each virtual device track pointing to another table containing the physical track location for the entry. Each physical track table entry identifies the number of virtual track entries that point to this entry by use of a reference count mechanism. Each virtual track entry that points to the physical track is called a xe2x80x9creference.xe2x80x9d The reference count increments when a new virtual track table entry pointer points to this physical entry (e.g. snap) and the reference count decrements when a virtual track table entry pointer is removed (e.g. update source after a snap). When a reference count is zero, then that physical track can be deleted from the back-end since it is known that there are no references to the physical track.
Although each back-end track""s reference counter is normally maintained, times are present when the counters are known to be inaccurate and must be recalculated (regenerated) by scanning the entire list of virtual volume tracks. For example, situations occur during snapshot copy operations and recovery operations that require the regeneration of reference counts. Reference count regeneration is the process performed to determine how many virtual tracks refer to (i.e. references) each track stored on the back-end.
As the number of virtual volumes grows, the time it takes to perform reference count regeneration grows at a significant rate resulting in warm-start times that are unacceptable. For even 1024 volumes and larger back-end capacities, processor memory present is usually insufficient to accumulate all counters simultaneously needed to scan the virtual volume tracks once. As a result, to perform reference count generation in this case, multiple scans through the virtual volume tracks are needed. Multiple scans can cause a performance penalty in the storage subsystem.
In addition to problems with reference counts, presently available snapshot copy operations are unable to provide quotas on the number of operations or restrictions based on the locality of sub-systems in a distributed network.
Therefore, it would advantageous to have an improved method and apparatus for managing a data file storage system using snapshot copy operations.
The present invention provides a method and apparatus in a data processing system for managing data access to a plurality of storage devices. In particular, the present invention may be applied to copy operations involving virtual tracking. The plurality of storage devices is grouped into a set of groups. Responsive to a request for a copy operation to copy data from a first storage device to a second storage device, a determination is then made as to whether the first storage device and the second storage device are in the same group within the plurality of groups. Responsive to the first storage device and the second storage device being in the same group, a data file pointer to the original data is stored in a data structure for the second storage device. Responsive to an absence of the first storage device and the second storage device being in the same group, occurrence of the copy operation is prevented. With these groups, regenerating reference counts may be performed by snap groups, minimizing the resources needed because the number of counters needed to track references is at most the number of virtual tracks in a group. In this manner all of the virtual tracks need not be scanned at once. Also, only a single scan of physical tracks is required.