Modern corporate enterprises have large volumes of critical data, such as work documents, emails, and financial records, that requires backup and recovery to prevent data loss. During a typical backup procedure, data stored on client workstations and servers is sent to a backup storage. During a typical recovery procedure, backup data is retrieved from the backup storage and reconstructed on client workstations and servers. Since the amount of data that requires backup can be very large, which even for a medium-size company can be measured in hundreds of terabytes, the backup process can be very resource intensive and time consuming. Furthermore, since the data backup process must be performed frequently, e.g., daily, semi-weekly, the backup process can be quite onerous on the corporate network.
FIG. 1A illustrates a conventional process for data backup process of a computer file. As shown, in a first state a full data backup 10 is created of the system, computer, and/or the like. Next, in a second state, a first increment 20A of all new and/or modified data is added to the archive, i.e., appended to the end of the data backup 10. Next, in a third state, a second increment 20B of all new and/or modified data is added to the archive. This process is repeated continuously. A fourth state is also shown in which a portion of the data 11A in the original backup 10 is no longer needed. This data portion 11A can be removed from the archive leaving a fragmented file 10A with randomly spaced occupied and unoccupied blocks. Over time, a large percentage of free space can actually be wasted and the speed for accessing such fragmented files 10A, for example, when reading or searching data, can be slowed down considerably.
FIG. 1B illustrates an exemplary fragmented file 10A resulting from the backup process described above. As shown, the fragmented file 10A can have many unused sectors 11A (shown as the cross-hatched boxes). These unused sectors 11A are obviously a considerable waste of resources that can lead to overly expensive data storage and also increasingly slow access times of the remaining usable data.
Existing solutions to address the problems of such fragmented file include sparsing operations that removed unused sectors within data files (e.g., fragmented filed 10A). However, sparsing algorithms have limitations. For example, the file system of the archive may not support the sparsing operations. Moreover, if the file is transferred from the archive with a file system that supports sparsing to another location that does not support sparsing operations, the unused regions in the “sparse” file are filled with digital zeroes. Furthermore, even in file systems that support sparsing (e.g., New Technology File System or “NTFS”), the use of sparsing functions may be accompanied with a number of restrictions so that the benefit of sparsing is not always realized.
Therefore, there exists a need to reduce the overhead (e.g., time, system resources, etc.) associated with searching for existing blocks and adding new blocks.