N/A
This application relates generally to file backup systems and more particularly to the administration of a differential file backup system in a client-server environment.
Client-server network systems are well known and widely used in many industries and for many applications. In a typical client-server system, a user operating a client machine sends data to one or more central computers, the server, for processing. The processed data may be stored locally on the client or centrally on the server. In either case, a single point failure, i.e., the failure of the primary data storage system, whether on the client or on the server, can result in a catastrophic loss of data. To prevent this loss of data due to a single point failure, a file backup system is commonly employed to allow recovery of the client data.
Traditional file backup systems perform a full backup of a file designated to be backed and then save full backup versions of that file only when changes had been made to it. These systems require large amounts of storage space and over time, the storage requirements became untenable. Differential backup systems provided an improvement in the amount of storage required over time by not repeating a full backup of a file after the initial save of the file. In these systems only the changes, i.e., the differential between the original file and the new file, are saved. In this way, a file can be reconstructed by combining the various components of the file that include the initial file fully saved(the base) and the plurality of differential files (the delta files). This incremental approach to backup file systems can reduce the backup time, and the storage requirements for the overall system.
Differential or incremental backup file systems do have a few problems associated with their operation. First, differential backup systems are not as robust as full file backup systems. Second, a large number of old versions of a file can accumulate within the backup storage device occupying potentially valuable storage space and increasing the recovery time. Third, differential file backups received over a long period of time can result in the fragmentation of the various file components over the media, and in the case of tape backup system, the file components may be distributed across several different tapes.
It would therefore be desirable to be able to back up files in such a way that the file components are contained on a small number of tapes and that the data is processed to allow a file backup system to reclaim storage space by processing the file components.
A method for administrating a differential file backup system in a client-server environment is disclosed. In one embodiment, the method includes reducing the number of access points associated with the components of a file that has been stored on the file backup system that include a base file and at least one delta file. A server reads data from a first memory device used by the file backup system. The data includes the base file and the at least one delta file of a backup file of interest and writes the data to a second memory device. The server then processes the data contained in the second memory device to reduce the number of access points the components of the backup file have across the first memory device.
In one aspect of the invention, the files that comprise the placements of the components of the backup file of interest are reconfigured so that the component files are adjacent to one another when written to the first memory device after processing. In another aspect the component files of the backup file of interest are grouped according to the date of the last modification of the file.
In another embodiment of the present invention, a subset of the component files that include the base file and one or more delta files are coalesced together to form a new base file. In one aspect of this embodiment, the files are selected according to one or more file expiration rules. In another aspect, the subset of files are selected according to the number of delta files that exist after the last base file was created. In another aspect, the server determines the size of the files that are to be coalesced together and estimates the size of the new base file after coalescing. The coalescing operation will only be performed if the difference between the two sizes is greater than a predetermined value.
In another embodiment of the present invention the server detects if a coalesced file contains corrupted data and requesting that the appropriate client retransmit an uncorrupted copy of the file to the server.
In another embodiment, the files to be backed up are further stored in archive files that are written to the first memory device of the file backup system. The archive files are processed to reduce the number of access points relative to a backup file of interest by reading the archive files from the first memory device of the file backup system, and writing the archive files to a second memory device. The server rearranges the archive files that contain components of the backup file to be adjacent when written back to the first memory device. In another aspect, the component files within the archive files can be rearranged so that files that have not been modified recently are grouped together and files that have recently been modified are grouped together.
In another embodiment, the server selects a backup file of interest and reads the components from the first memory device of the backup system to the second memory device. The server reconstructs the backup file of interest and detects if the reconstruction of the backup file fails. In the event of the failure to reconstruct the backup file, the server requests that the client retransmit the most recent version of the file corresponding to the backup file of interest. The server receives the retransmitted file and stores that version, and deletes the corrupted file.
In another embodiment, the server selects a backup file of interest and reads the components from the first memory device of the backup system to the second memory device. The server reconstructs the backup file of interest and detects if the reconstruction of the backup file fails. In the event of the failure to reconstruct the backup file, the server requests that a mirror server transmit an uncorrupted version of the backup file of interest. The server receives the retransmitted file and stores that version, and deletes the corrupted file.