On a backup storage appliance, system administrators often divide storage for different collections of backup data, e.g., backup directories may be created for different user groups. For example, separate backup directories may be created for sales, development and customer support. The prevailing solution to dividing storage is to assign a quota for each backup directory. Conventionally, there are two basic types of disk quotas. The first, known as a usage quota or block quota, limits the amount of disk space that can be used by provisioning a maximum byte usage for the backup directory. The second, known as a file quota or inode quota, limits the number of files and directories that can be created on the backup directory. Through the use of quotas, system administrators are better able to manage backup storage appliance space. Using a quota setting for backup directory allows a storage appliance to report available space for a backup directory, i.e., the difference between used and allotted storage space. The report of available space allows system administrators to recognize when storage usage approaches quota limits and take action to prevent the quota limit from being reached. Additionally, the report enables administrators to prevent overuse of storage space, i.e., stop backup file writes when a quota limit is reached. This prevents exceeding the allotted/provisioned storage space, thus ensuring that one backup directory will not encroach on another backup directory's space.
Block quotas and file quotas work well for conventional undeduplicated storage systems. However, with deduplicated backup storage, a new type of quota is needed because physical storage is shared across backup directories. A deduplicating storage system consists of several levels of logical data abstraction above the physical disk storage. At the highest level, a logical namespace exists which allows a user to access data stored on the disk through an external application which resides on a client. A user can access data through any of the following protocols: virtual tape libraries (VTL), Data Domain BOOST, Common Internet File system (CIFS), and Network File System (NFS). Each namespace references/represents one or more hierarchies of one or more directories, and stored within each directory are files, e.g., user text files, audio or video files. Files, in turn, are segmented into a collection of data segments/chunks which are stored on a physical disk. In a deduplicated storage system, the data segments are hashed to create fingerprints, which are used in determining whether the data segment already exists on the physical disk. If the generated fingerprint does not match a collection of fingerprints that is currently stored on the storage system (i.e., the data segment does not currently exist on the storage system), the data segment is written to the physical disk storage, and the new fingerprint is added the existing collection of fingerprints representing the existing data segments on the physical disk storage. On the other hand, if the fingerprint of a new data segment matches a fingerprint in the collection of existing fingerprints, then the data segment is not stored onto the physical data storage.
Deduplication is performed across all data on the entire storage system. Thus, a backup directory cannot be clearly associated with the physical storage space it occupies because the same storage space may be shared by multiple backup directories. Conventional block quota and file quota systems do not provide status of storage usage which accounts for deduplication, resulting in under provisioning of the storage system. Some backup applications have started using the deduplication ratio in an effort to enforce quotas more efficiently. A deduplication ratio is derived by dividing the used logical space by the used physical space. This deduplication ratio is then applied against the logical size of the files to estimate the physical space the files will require. However, there are two problems with this quota system. First, the deduplication ratio must be calculated using the entire back up appliance logical and physical space, since there is no measurement for back up directory physical used space. Second, deduplication ratio of all data on the storage system dynamically changes as new backup files are added and old backup files are expired. Thus, an erroneous high deduplication ratio leads to over provisioning that may cause backup files to fail as the storage system physical space usage reaches full capacity. Conversely, an erroneous low deduplication ratio leads to under provisioning and the storage system will not utilize all available physical storage space. Thus, using the deduplication ratio to derive a fixed block quota is problematic.