System clients may periodically backup data to a remote location. This allows files on a given system to be restored in the event of a loss. Storing the data remotely further reduces the impact of environmental catastrophes, and may allow backups from multiple clients to be managed from a central location. Remote storage may, however, require unnecessary data transmission. For example, a client may query remote storage to determine if the storage already contains a piece of data. This query may increase network traffic, latency, server and client workloads, and increase backup duration. One approach to solving this issue is a client cache.
Client caches present their own problems, however. In recent years, disk size has doubled nearly every 18 months, while memory size has grown at a much slower rate. Further, the cost of additional memory can greatly exceed the cost of additional disk space. The exponential rate of disk growth and memory cost may make it difficult to load a backup cache into client memory since the cache on disk may exceed memory capacity. On large storage systems, the cache may start to miss and make unnecessary server queries, or require a memory upgrade. This trend may continue as long as disk size continues to outstrip memory growth.
There is a need, therefore, for an improved method, article of manufacture, and apparatus for managing a client cache in backup systems.