Maintaining the integrity of files stored in a computer system is imperative. This is as true for a single user computer as it is for commercial enterprises that support many computers that operate alone or that are interconnected by some sort of computer network. A good practice that is commonly followed by computer users is to copy the files stored on a computer either to a removable medium, e.g., floppy disk or zip drive, or, if available, to mass storage devices on a computer network, e.g., file servers. This process is sometimes referred to as a “backup” process.
This practice may be adequate when the volume of files that are backed up is small or if the files are only maintained locally and there is no need to share files among multiple users. Present day computing, however, is not that simple. To the contrary, present day computer users no longer do business while tethered to a stationary work station in a traditional office environment. Technological progress has led to a surge in mobile and remote computing. Mobile and remote users need to be as productive away from the office as they are when they work in a traditional office setting. To accomplish this desired level of productivity, users need access to network resources and up-to-date information. As a result, enterprise data and information is being stored beyond the traditional office environment and is spread across remote offices, remote personal computers (“PCs”), mobile PCs such as laptops, and Personal Digital Assistants (“PDAs”). Thus, critical data stored on mobile and remote PCs, for example, documents, presentations and e-mail files, which can grow to hundreds of megabytes, are not properly protected nor are they always available to other users. As a result, there is even more of a need to ensure the integrity of files and accessibility of current copies of files to all users now that they may be spread out among remote and mobile computers.
The problem of file integrity is particularly acute for remote and mobile computers in that the information stored on a mobile or remote user's computer may not be stored anywhere else. In addition, in instances where files are maintained on a server in a network environment, the server copy of files may not reflect the latest changes or copy of the files if a mobile or remote user was working on files locally on his mobile or remote computer. Because typical synchronization of such large files (for example, 200 to 300 megabytes) even over a local area network can take about 10 to 20 minutes, users are discouraged from creating copies of this information and thereby synchronize local copies of files with copies stored in the network.
A number of solutions have been proposed to overcome these shortcomings and facilitate the backup and synchronization of files. Traditional methods for backup and synchronization of files are, for example, copying network files and databases to the hard disk of the local PC and then, if appropriate, synchronizing the stored copies with the network copies of the files maintained on one or more network servers. This “copy and synchronize” approach, however, is an inefficient use of network bandwidth in that entire files are copied and transmitted during the backup and synchronization process.
Other techniques utilized by backup and synchronization processes are known as “delta technologies.” Known techniques employing delta technologies are so called “block level differencing” (illustrated in FIGS. 1a and 1b) and “byte level differencing” (illustrated in FIGS. 2a and 2b). These techniques are described by James J. Hunt, Kiem-Phong Vo and Walter F. Tichy in “An Empirical Study of Delta Algorithms,” Sixth International Workshop on Software Configuration Management in Berlin, 1996, and Andrew Tridgell and Paul Mackerras in “The Rsync Algorithm Technical Report TR-CS-96-05,” Department of Computer Sciences, Australian National University, 1996.
In block level differencing, a local copy 14 and a remote copy 16 of a file are divided into “delta” blocks 18 and 20 on a client computer 10 and a server computer 12, respectively. A comparison is made of the respective blocks and the differences between the local and remote delta blocks 18 and 20 are generated and stored in a data structure 22. The data structure 22 is then transferred during synchronization from the client computer 10 to the server computer 12 where the differences are applied to the server copy of the file 16 by a software process running on the server computer 12.
In byte level differencing, the client and server copies of the local copy 14 and the remote copy 16 of the file being synchronized are compared and differences down to the byte level are generated and stored in a data structure 34. This approach produces much smaller differences. The data structure 34 is then transferred from the client computer 10 to the server computer 12 during synchronization so that the differences can be applied to the server copy of the file 16 by a software process running on the server computer 12.
The potential inefficiencies in these processes are apparent. Both require two communication sessions between the client computer 12 and the server computer 14. The first to ascertain the differences between the files and the second to transmit the differences to the server computer 14 so they can be applied to the server copy of the file 16. In addition, because the processes that compute the differences are computationally intensive, they will consume a significant amount of time and a substantial amount of processing resources.
Another known technique that has been utilized to track changes made to database files is known as “database journaling.” This technique requires the database application program to keep a journal of all changes made to a database file. These changes are then utilized during synchronization to incorporate changes made in the local copy of the database file to a remote copy of the database file.
This technique, however, is application specific in that it cannot be used to backup and synchronize files for which the application programs modifying the files do not themselves create change journals. In practice, the change journals are applied to synchronize a remote copy of the database. Typically, only high end database applications create change journals. Most popular software application programs, including Microsoft® PowerPoint®, Access, Outlook and Word (all products of the Microsoft Corporation located in Redmond, Wash.), do not create change journals.
None of the foregoing techniques address the issue of how to perform the synchronization process when one or more of the files to be synchronized are still in use by an application program. Normally, only files that are inactive or closed at the time the synchronization process is run are included. This is because application programs typically open files in an “exclusive” mode so that other application programs or processes cannot read or write them when they are opened.
A typical computer user keeps his mail client, such as Microsoft Outlook or Netscape, always running. Similarly, if the user were primarily working on application such as AutoCAD, which is available from Autodesk of Cupertino, Calif., the relevant database may always be open. These opened files cannot be read by other processes and application programs as the application program operating on the files has opened them in “exclusive” mode. Therefore, current techniques (direct copy and different delta differencing methods) used to backup and synchronize files will fail as they need to read the contents of the source file to perform their task.