1. Field
The present description relates to a computer program product, system, and method for backing up data including selecting data for movement from a source to a target.
2. Description of Related Art
There are various known techniques for backing up data. These backup techniques are often implemented using a storage-management server which can store data objects such as user files in one or more locations often referred to as storage pools. The storage-management server frequently uses a database for tracking information about the stored objects, including the attributes and locations of the objects in the storage pools.
One backup technique referred to as “deduplication” is a method of reducing storage space used to store data by eliminating redundant data in files sharing common data. In deduplication systems, typically only one unique instance of the data is actually retained on storage media, such as disk or tape, and additional instances of the data in different files or databases may be replaced with a pointer to the unique data copy. Thus, if only a few bytes of a new file being added are different from data in other files, then only the new bytes may be stored for the new file and pointers are included in the added file that reference the common data in other files or databases.
Thus, deduplication provides a method to remove redundant data during a backup operation, thereby reducing required storage and potentially conserving network bandwidth. A deduplication system often operates by dividing a file into a series of chunks, or extents. The deduplication system determines whether any of the chunks are already stored, and then proceeds to only store those non-redundant chunks. Redundancy may be checked with chunks in the file being stored or chunks already stored in the system.
Caches are frequently used to temporarily store data retrieved from storage. Such caches can provide faster access to data which is frequently used or is otherwise anticipated to be needed. There are various known caching algorithms for selecting data for retention in the cache or for flushing from the cache. Such cache techniques include the first in first out (FIFO) technique which can flush the oldest data from cache. Another cache technique is the least recently used (or read) (LRU) technique which can flush the least recently used data from cache.