1. Field of the Invention
The present invention generally relates to computer systems, and more particularly to a method of displaying information associated with computer file systems, such as directories, files, and symbolic links.
2. Description of the Related Art
To an end user, most computer systems have the same general structure for storing and accessing data, that is, by placing the data in "files" whose names have a particular format, and placing files in "folders" or "directories" to further organize them. These file objects are physically encoded into the machine's storage device, e.g., hard disk. Computer operating systems such as UNIX or MS-DOS use this type of a filing system ("UNIX" is a trademark of UNIX System Laboratories; MS-DOS is a trademark of Microsoft Corp.). In these systems, each file has a unique path name which identifies its location within the file structure. UNIX and MSDOS computers have a "root" directory from which all other directories or sub-directories branch out; in a UNIX operating system, the root directory is designated by the forward slash symbol ("/"), which is also used to separate parts of the path name. For example, the path name "/pdir/sdir/myfile" refers to a file named "myfile" that is located in the "sdir" subdirectory, which is, in turn, located in the primary directory "pdir" at the root level.
Processes interact with the file system using a specific set of commands, such as "open," "read," "copy," etc. For example, the command line "copy .backslash.user.backslash.oldfile.backslash.system.backslash.newfile" instructs the operating system to create a duplicate of the file named "oldfile" in the "user" directory, name the duplicate copy "newfile," and place it in the "system" directory. Such interaction can extend between cooperating computer systems, i.e., two or more UNIX machines that are interconnected. These distributed systems provide different types of access to remote computers based on the particular system architecture. In a fully "transparent" distributed system, standard path names are used to refer to files on other machines, as the operating system recognizes that they are remote. In other words, users on one machine can access files on another machine without realizing that they cross a machine boundary. This configuration is convenient for users who commonly share files, whether program or data files. Files can also be shared among users using a single machine whose physical storage device has been partitioned into two or more sections that are specifically assigned to each user when the system is mounted (so there are multiple root directories, one for each user).
Typically a software product comes in many versions. It might evolve over time, adding new features or improving performance, etc. Each new version is typically built from different versions of the same source code, header files, etc. The file system objects utilized to build a given version of software will herein be referred to as "snapshots," given that collectively they are used to build one of many possible versions of a software product. At the user level, four basic schemes have traditionally been employed to make snapshots visible to tools: (a) those that populate tool working directories with snapshot copies prior to tool invocation, (b) those that prepopulate tool working directories with symbolic links pointing to snapshots, (c) those that feed snapshot pathnames to tools at tool invocation time, and (d) those that resort to intercepting tool generated system (input/output, or I/O) calls by way of special took-linkable libraries.
Variations of the first schema are impractical if a sizeable number of snapshots must be made visible, due to time constraints and storage capacity (disk space,) and due to the necessity of recopying objects each time the tool is invoked. If a build took (e.g. UNIX make) is using "dependency" checking to ensure that all source files are up to date, the timestamps must be retained on copies which is easily overlooked and can result in unnecessary rebuilding. Confusion is also likely to result if copied files are updated. For example, if a build is partially successful, and one or more copied files have been updated in order to correct the error(s) that caused that build to fail, those updated files must be recopied. If the builder makes a mistake, perhaps because it is not cognizant of all files that have in fact been updated, the wrong files may be processed when that partially successful build is restarted.
The obvious virtue of the second schema is that it completely eliminates the time and space problems associated with the aforementioned simple copying schema. The new problem is that if the manifestation of a snapshot is a symbolic link, issuance of every possible modification system call against snapshot manifestations cannot be permitted. With two notable exceptions, namely remove( ) and rename( ) (in UNIX,) symbolic links do not protect their target objects from arbitrary modification system calls. If given a symbolic link, remove( ) does not remove the object pointed to by that symbolic link, but rather the symbolic link itself. Similarly, rename( ) operates on a passed link rather the object pointed to by that link. But snapshots manifested by way of symbolic links are not necessarily protected from other modification system calls. AI remaining modification system calls ultimately follow links (e.g. creat( ), mkdir( ), mdnod( ), truncate, write( ),) applying the requested operation to the object pointed to by that link, In this case the only way to protect these objects is to deny write access to them, but then tools are not permitted to issue any of the aforementioned system calls against any of the symbolic links that might refer to these objects. System call anomalies are the norm for circumstances of this sort. If a snapshot of "x.o" exists, it is made visible by way of a symbolic link, and is protected in the aforementioned way, a simple command kke "cc-c x.c" will fail because that "x.o" directory before the compile will succeed. If the newly generated "x.o" file to become a member of some snapshoted library made visible by way of a symbolic link, say the library with name "libfoo.a," a simple subsequent command like "ar x.o libfoo.a" will also fail if that snapshot is not writable. The file "libfoo.a" must also be explicitly copied.
Symbolic links have always been troublesome because they are indistinguishable from the objects they point to. A symbolic link that points to a directory is not itself a directory, though it may under certain circumstances appear to be one. Symbolic links cannot be assumed to be, for all intents and purposes, directories or files. Symbolic links masquerading as files or directories may or may not fool tools that expect to operate on files or directories. It depends upon which system calls those tools use, and when they are not fooled, they are likely to be confused, a condition that usually results in abnormal termination.
Given the quirks associated with symbolic links, one might be tempted to simply make use of some variant of the third schema: passing pathnames of snapshots directly to the build tools that manipulate them, rather than referencing them in a roundabout fashion via symbolic links.
If a make-like build took determines that some buildable item is out of date, it either directly or indirectly executes a user supplied build rule to (re)build that item from its dependencies. In general, each such build rule consists of one or more lines of shell commands, each of which when put into execution, invokes some build tool (e.g. compiler, linker, loader) via its shell interface. One such line might for example invoke a C compiler to (re)build an "x.o" object file from a source dependency with name "x.o." The make-like tool maybe modified in such a way that, just prior to executing the build rule containing this compile line, it replaces the dependency basename "x.o" with the complete pathname of the snapshot that should be made manifest as "x.o." Thus when the compiler is involved, it will see the proper snapshoted version of x.o. The make-like took can be further modified so that it scans the entire build rule prior to execution, replacing all dependency basenames with complete snapshot pathnames. In this way, the appropriate snapshots of all dependencies maybe explicitly named in a build rule visible to the tools that must manipulate them, manifested by explicitly feeding the pathnames to the tools that use them.
The foregoing might seem sufficient for making snapshots visible, but it is not. The files that a UNIX tool operates on are often named on the command line used to invoke that took, but not always. Many extremely important exceptions exist. A C compiler might for example read a large number of header files, none of which are explicitly named on the command line responsible for invoking that compiler. Instead it finds the names of these header files in the source text it is asked to compile, embedded within "include" preprocessor statements. In order to find the file associated with a header file name, it searches the set of include file directories specified on the command line via the "-I" flag. In order to make visible the appropriate snapshoted versions of these header files, the make-like tool must therefore not only expand dependency basenames appearing in a build rule, but must also insert "-I" flags that list the appropriate set of snapshot directories a C compiler should search for snapshoted header files. This is processing that requires knowledge of a specific tool, in this case a C compiler. Every build rule step that invokes a C compiler must be found and expanded in a C compiler specific way, namely by the insertion of a list of header file directories to search. It is not difficult to see the general problem this might lead to. If a C compiler command line must be expanded in a C compiler specific way, a command line for some arbitrary tool X might need to be expanded in an X tool specific way. In this case the make-like took must also be modified to scan for and expand took X command lines appearing in our build rules, which is a difficult undertaking. Matters are made yet more complex when one considers that a program invoked by a build rule might need to invoke yet another program in order to do its job. If that other program must manipulate a dependency, the first program that invoked it must explicitly pass the complete expanded pathname of that dependency to the other program. Otherwise, the correct snapshoted version of that dependency will not be visible to the other program. This requirement must be noted as a convention, a convention that users must be made aware of and must diligently follow. Finally, if we manifest a snapshot by way of its complete pathname, and snapshots are read only, tools that see snapshots via this mechanism will not be able to modify them.
Variations of the fourth schema also suffers a number of disadvantages. All tools that might be used in build must be linked with a library providing alternate implementations for all I/O system calls but it is not always possible to do this. It should also be kept in mind that the tools relinked to this special library may be used by others for purposes other than building software. In such a context, outside of any build environment, it might be the case that those alternate system call implementations results in unacceptable performance problems, or fail with some strange error. Additionally, tools often come prelinked to the standard I/O library. The I/O system calls issued by these tools cannot therefore be intercepted.
Rather than rely on user level schemes to make snapshots visible, a more recent approach has been to implement special file systems that are specifically designed to make snapshots visible to tools in novel ways.
Examples include Sun Microystem's TPS, Bell Lab's 3-D file system and Atria's MVFS. These systems, though similar at a high level of abstraction, differ markedly in detail, largely due to the fact that each has been designed with a very different stipulated user community in mind.
At its simplest, a directory viewpath mechanism manifests an illusory patchwork directory that might be seen along some ordered list of specified directories, an ordered list that constitutes a "viewpath." All children of the foremost directory in that list are brought into view, as are all children of the second directory without named counterparts in the foremost directory, as are all children of the third directory without named counterparts in either the second or the first directory, and so on. If the viewpath mechanism happens to be implemented by a build file system, it is likely that all of the directories in that ordered list are themselves snapshots of a given directory, snapshots that represent revisions of that directory as it evolves over time.
Directory viewpath mechanisms come in a variety of forms. The more flexible the filesystem viewpath mechanism, the fewer restrictions need be place don the structure of the directories searched for snapshots. Of the three aforementioned file systems, TFS is likely the closest prior art. TFS comes packaged with Sun's Network Software Environment (NSE.) Within that environment, it is used to construct the directory trees that software developers do virtually all of their work in, from prebuild activities like source code editing to build activities like compiling and linking to post build activities like debugging. NSE's "activate" command is used to perform the TFS mounts needed to make a TFS directory visible. The snapshots made visible by TFS are found by way of an extremely simple viewpath mechanism that essentially manifests virtually all snapshots at mount time, rather than by way of a search on reference by snapshot name. In effect, TFS permits but a single mountpoint viewpath, statically defined by physically linking all of the physical directories appearing in that path. The n+1th physical directory appearing in the viewpath is specified by way of a textfile named "tfs.sub.-- info" appearing in the nth physical directory of that same viewpath. The textfile itself is referred to as a "searchlink" and simply contains the pathname of the n+1th physical directory. The viewpath itself behaves in precisely the manner described above. Tools simply see the illusory patchwork directory along the set of physical directories making up the viewpath. Though not explicitly assigned one, each mountpoint subdirectory might be thought of as having its own viewpath as well, derived from the viewpath of its parent in a very simple manner, by appending the subdirectory name to the pathname of each directory appearing in the parent viewpath. Thus, the resulting TFS directory takes on the semblance of the directory that might logically be constructed by "layering" the searchlink connected viewpath directories, one over the other over the other, from the last directory in the viewpath to the first.
With this view in mind, TFS refers to each viewpath directory as a "layer," where each layer is to be thought of as a revision of the mounted TFS tree; a snapshot of its contents at some point in time. The first layer appearing in the viewpath is envisaged as the newest revision of the mounted directory, while the last layer is envisaged as the oldest. If multiple revisions of a file exist along the viewpath, the revisions appearing in the foremost layer is made manifest, while the remaining revisions are masked. The names appearing in the TFS directory thus consists of the union of the file names appearing along this single viewpath. If a tool attempts to reference a name that does not currently appear in a TFS directory, TFS will not dynamically search for and make manifest an appropriate revision with that name. It will simply report the name as nonexistent. Files in a common directory cannot be manifestations of revisions found by traversing two or more independent dynamically selected viewpaths. This makes it extremely difficult to make manifest "alternate" versions of files, versions that do not appear in any viewpath layer, but perhaps instead in a special set of directories that have been setup to revision alternate versions of those same files.
In TFS, only the first layer in the viewpath, referred to as the "front" layer, can actually be written to; all other directory layers are read only. From the tool point of view however, all manifested objects appear to be writable. If the manifestation of a revision appearing in some back layers (i.e. some layer other than the front layer) is made the target of a modification system call, TFS copies the revision to the front directory, and redirects that an all subsequent system calls to the resulting copy. If a TFS directory is asked to create a file, the file made manifest as a result of that request is in actually created within the front layer directory. Thus, the front layer contains all files that have been modified/created since mount time.
In general, it is assumed that all back layers in a viewpath are public shareable directories. A special command is used to create a user private front layer that points via search link to the first of these public back layers. Thus, each user can arbitrarily modify shared files made visible by virtue of this mechanism without concern. In all such cases, they are copied by TFS to the user's private front layer without her awareness. If these public back layers belong to two or more viewpaths at the same time, only the front layer of each such viewpath may differ, a back layer cannot be made to simultaneously point to two or more different following layers. Thus is a direct consequence of the fact that each such directory may contain but a single search link.
The 3-D file system introduces the notion of a "version file", where a version file is conceptually organized like a directory. Whenever a process references a 3-D version file by name, the file system selects and makes manifest a specific "instance" of that file conceptually contained within that directory, based on per-process information maintained in tabular form. Tools that are aware of the fact that they are manipulating a 3-D file may also explicitly select a specific instance of that file by tacking a "/X" to the end of that file's pathname, where X is the "name" of that specific instance (assigned at instance creation time). Unlike TFS, the 3-D file system thus provides file revisioning support for user-level version control systems.
Much like TFS, a 3-D directory might be assigned a single viewpath. Also, like TFS, each and every subdirectory possesses a viewpath implicitly derived from the viewpath of its parent, by appending the name of that subdirectory to the pathnames of each directory appearing in that parent viewpath. Unlike TFS however, these derived viewpaths can be overridden by explicit assignment. All explicitly viewpath assignments are specified in simple tabular form via a file that associates the pathname of a 3-D directory with the set of directories in its viewpath. The table can be set for a particular shell via a special shell command. Once set, the directories specified by that table will, for all intents and purposes, appear to be layered in the manner of TFS. Similarly, a table of slightly different form might be used to set instance mappings for version files, so that the correct instance of these files will be made manifest.
With MVFS, versioned data is stored in something Atria refers to as a network-accessible Versioned Object Base or VOB. A VOB is a simply a mountable file system. Like any other mountable file system, many VOBs can be mounted, as desired. A VOB will version files containing virtually anything (binaries, libraries, directories, text, etc). Unknown to the user, when appropriate, text files might be stored in delta or compressed formats in order to save disk space. A database manager associated with each VOB is used to maintain meta-data for every versioned object (referred to as an "element") within that VOB. Recorded meta-data might for instance note when an element was initially created. It also supports traditional checkin/checkout/branch creation/element deletion/compare/merge type facilities that might be applied to elements. Elements are made visible to UNIX tools when a VOB is first mounted. From the point of view of a standard UNIX tool, a mounted VOB appears to be no more than a standard UNIX directory tree, a directory tree filled with standard UNIX files, directories, and symbolic links, all of which the knowledgeable user knows to be MVFS element, versions. Thus, a VOB provides transparent tool access to element versions. Like 3-D however, the file system must make use of a special table in order to dynamically resolve tool element references a particular versions of those elements. In the case of MVFS, this table, referred to as a "configuration specification", contains a number of user-specified version selection rules that the file system tests on element reference. Once "set" by a user for a particular VOB, a configuration specification is said to present that user with a "view" of that VOB, simply because it determines what version of the VOB's elements that user will in fact see and work with. The collection of bound elements (bound in the sense of having been mapped to particular versions) is itself collectively referred to as a view "configuration", and might be modified by the developer at any time. Each selection rule, appearing within a given configuration specification, maps a qualifying expression of some kind (typically a wildcard expression) to a description of some version. When an element is referenced, the file system attempts to match the element name with the qualifying expression of each and every rule, from first rule to last. If the name matches (i.e. meets the criteria of) some qualifying expression, the element version that meets the criteria of the associated version description will be fetched by the file system. In the case of MVFS, the manifestations of elements are read-only. Thus, MFS transparently makes manifest files for build tools by way of a special file system.
As can be seen, while some of the prior-art file systems provide superior transparency, they still suffer from other restrictions. It would, therefore, be desirable to devise an improved mechanism for making the appropriate versions of files and/or directories transparently visible to users, and effectuating that transparency in such a way that the user may read, write, truncate, remove, rename, or otherwise modify them without modifying in any way the original (snapshot) files and directories made visible by that mechanism, but without the foregoing restrictions (e.g., preventing back layers from simultaneously pointing to two or more different following layers). It would be further advantageous if the mechanism requires no specialized knowledge on the part of the user to implement and provide for dynamic selection of the appropriate viewpath.