1. Field of the Invention
The present invention relates to data processing system file storage subsystems and in particular to methods and structures for reducing storage requirements and administrative tasks involved with maintaining individual copies of a common base set of files by automatically storing only changes (deltas) made to individual files for each particular user""s session.
2. Discussion of Related Art
In the computing arts, it is common that a base set of files stored in a mass storage file system are shared by a number of users or processes. For example, a group of software developers may share access to a common base set of files which represent the product/project under development. Or, for example, groups of analysts may share access to a common base set of financial data files for which they each perform requisite analysis.
Each user or process may, for analysis or development purposes, may wish to independently make relatively small changes in the base set of files. However, frequently in such cases the common base set of shared files is intended to be read only or intended to be updated only in a global manner such that all files will contain a consistent set of information. For example, where a group of software developers work jointly on a software project, the base set of files may represent the current development working version of the software project or product. It is therefore desirable that this base set of files be updated only in a consistent manner. This assures that the common base set of files may be relied upon to be internally consistent as the present operable or released version of the project or product.
A typical solution to this problem as presently known in the art provides that each individual developer in the group maintains his or her own complete snapshot of the base set of files or, at a minimum, a snapshot of each file from the base set of files affected by that developers efforts. Periodically, e.g., nightly or weekly, all developers in the group will coordinate the update of the base set of files to incorporate changes made in their private copies of particular files into the base set of files.
Maintaining a complete snapshot copy of the entire base set of files is costly in terms of storage requirements in the computing enterprise. Where each user/process maintains a complete duplicate set of the base set of files, storage requirements can grow dramatically in proportion to the number of users/processes accessing the common set of files.
This storage cost can be reduced by copying only the subset of files in the base set of files which are actually impacted by the individual""s efforts. Other files which are unaffected need not be copied. However, the unaffected files are typically required for use of the base set of files. For example in the software development environment noted above, the build process for the software product or project requires modified (i.e., locally stored) files as well as the remaining unmodified files from the globally stored base set of files. The remaining unmodified files from the base set of files must therefore be xe2x80x9clinkedxe2x80x9d in some manner with the subset of locally copied files affected by the developer""s efforts.
This particular problem of linking to unaffected files in the base set of files has been partly addressed in certain development environments (e.g., Unix and Unix dialect systems) by providing for xe2x80x9csymbolic linksxe2x80x9d to portions of the base set of files which are not affected by the individual developers efforts. Symbolic links provide an indirect reference to a file in the base set of files. The indirect reference is a directory entry in the area of storage used to store the local copies of files affected by the developer""s efforts. The indirect reference in the local directory points to the actual physical storage in the common repository for the set of base files. The storage for the file is therefore not duplicated for files of the base set of files which are unaffected by the user""s modifications.
However, such solutions tend to be uniquely applicable to Unix and Unix dialect systems rather than globally applicable to a wider variety of computing enterprises (such as Microsoft Windows environments). In addition, creating the requisite links is largely a manual process left to the individual engineers (or at best left to an administrator for the engineering group). Such manual processes are prone to error. Further, initially setting up a large number of symbolic links can be time consuming. A large number of such links also uses additional directory (v-node) entries which can be another valuable resource in storage subsystems of Unix environments.
Further, tools which may modify files in the base set of files need to be modified to understand the nature of such symbolic links to create and destroy them as necessary in accordance with the changes made by the user. For example, text editors used for modifying source code files would need to delete a symbolic link in the local directory when the corresponding source code file is modified and re-create a symbolic link if changes to a file are discarded to return the file to its original form. Similarly, a compiler tool would need to destroy symbolic links when a compilation process produces a new object module or in the alternative, all object modules would have to be stored locally thereby again increasing the storage space requirements. Similar problems would arise in applying symbolic links to other exemplary applications as noted above.
Similar problems arise where a base set of files are intended for read-only access but small modifications for each user may be desired. For example, a base set of financial records shared by a group of analysts may be generally intended for read only use. However, each analyst may wish to evaluate a particular option by experimenting with trial versions of files in the base set of files. Such xe2x80x9cwhat ifxe2x80x9d0 analysis might be unique to each particular analysts area of expertise and operation. Similar solutions have been applied to this environment and similar problems arise in such solutions.
Still another example of a related aspect of the above problems arises where a user wishes to use data stored in a true read-only medium (e.g., CD-ROM data) and to modify the data for their particular application. As above, present solutions involve copying the read-only data in its entirety or at least copying those files which are affected by the user""s specific activities and applications and manually linking to unaffected files.
The above identified problems may be viewed more generally as specific examples of a broader problem. Namely, there is a need to provide for changing portions of a base set of files which are not permitted to be changed while minimizing the requirements for storage capacity and minimizing potential for human error in identifying modified and unmodified files.
It is therefore evident from the above discussion that a need exists for an improved architecture to permit individual users or processes read-write access to individual files in a common set of files which, for any of several reasons, are not generally accessible for read-write access.
The present invention solves the above and other problems, thereby advancing the state of useful arts, by providing an incremental file system (also referred to herein as IFS) structure and associated methods which enable read-write access to individual files in a common base set of read-only files while minimizing the amount of storage required for each individual user""s session and minimizing potential for human error by automating the process of creating and destroying links between modified and unmodified versions of files. In the preferred embodiment of the present invention, the incremental file system is integrated with the file system services of the underlying operating system so as to operate in a manner transparent to the user processes which require or read/write access to the common base set of read-only files.
In particular, in the preferred embodiment, the incremental file system of the present invention is implemented as a file system filter module which intercepts file requests for file systems which are managed by the IFS. The intercepted requests are then processed in accordance with the methods of the present invention to provide a user""s session with full, read/write access to local, modified versions of files from a base set of read-only files as well as the unmodified files in the base set, transparently as compared to prior techniques requiring manual procedures.
More specifically, methods and structures of the present invention manage a base set of read-only files (also referred to herein as a xe2x80x9cshadow drivexe2x80x9d) by storing copies of files which are modified by the user in a local directory (distinct from the shadow drive). The local directory is also referred to herein as the xe2x80x9cshadow directory.xe2x80x9d The copy of a file stored in the shadow directory for purposes of user modification is also referred to herein as the xe2x80x9cshadow file.xe2x80x9d
When the user attempts to change the contents of a file in the shadow drive, the incremental file system of the present invention creates a xe2x80x9cdelta filexe2x80x9d associated with the original file from the shadow drive. The delta file is stored in the shadow directory and serves only as a flag indicating that the corresponding original file has been changed. The changes made by the user are stored in a shadow file in the shadow directory. The presence of a delta file corresponding to an original file coupled with the absence of a shadow file is indicative of a user modification which deleted the file in its entirety.
The delta file, the corresponding shadow file (if any), and the shadow directory are preferably all stored in a storage region unique to the particular user""s session. As used herein, a session refers to a group of related processes as defined by a particular user or a system administrator. For example, in the program development environment noted above, a session may be defined as all processes operating on behalf of a single developer working on a single task (i.e., a program enhancement). The same developer might have a second session defined for working on a second task (i.e., a program bug to be fixed independent of the first task). As is known in the computing arts, such a session may be comprised of several processes such as a program text editor, a compiler, an assembler, a linker/loader, a debugger, etc. Each of these exemplary processes may, in turn, perform its assigned task by spawning still other xe2x80x9cchildxe2x80x9d processes to perform specific subtasks, etc. Session as used herein is therefore intended to broadly encompass a single user, a single process, or any combination of processes that a user or administrator may define as a xe2x80x9csession.xe2x80x9d
Each session therefore has its own xe2x80x9ccopyxe2x80x9d of the base set of read-only files with the changes made by that session. However, unlike prior techniques, storage is reduced by eliminating the need for copying all files of the base set of read-only files and by eliminating the need to create large numbers of symbolic links.
When a user reads data from a file, the IFS of the present invention first attempts to locate a delta file in the shadow directory. If no such delta file is located, the user""s read request is satisfied by reading requested data from the original file in the common base set of read-only files. If, on the other hand, a delta file is so located in the shadow directory in response to a user""s read request, the IFS of the present invention satisfies the user""s read request from the corresponding shadow file stored in the shadow directory.
The IFS of the present invention solves the problems noted above with respect to prior techniques in that, as compared to prior techniques, substantially less storage is required to store changes associated with a particular files in the common base set of read-only files. Furthermore, since the IFS of the present invention is integrated with the operating system""s file system services (preferably as a filter module), the present invention obviates the need for using and managing explicit symbolic links as described above with respect to Unix based systems. Each session perceives that a private copy of the base set of read-only files is available and is both readable and writeable by the session.
The IFS of the present invention provides further benefits as compared to prior techniques in that incremental changes made to a common base set of read-only files may be easily deleted (xe2x80x9crolled backxe2x80x9d). Simple deletion of the delta files stored locally for a particular session eliminates the changes made by that session. Deletion of individual delta files rolls back the changes to the corresponding files while deletion of all delta files in the shadow directory rolls back all changes made by the session.
The above, and other features, aspects and advantages of the present invention will become apparent from the following descriptions and attached drawings.