1. Field of the Invention
This invention relates to the large scale compilation of software source code, and more particularly, to optimizing the sequence of source code compilations involving source code files residing in numerous different directories.
2. Description of the Background Art
Software programs are generally written in high level languages which are understandable by humans, and which use a set of arbitrary terms to represent computer instructions. These instructions are called "source code." This source code must be converted into a series of instructions which are executable by the computer on which the software is to operate. The converted set of instructions is called the "object code," and the process of conversion to object code is called "compiling." An efficient compilation process requires identifying only those source code files that have been modified in the most recent design cycle, and compiling them in the proper order. Generally, the compiling process for relatively small software projects requires the conversion of several different source code files residing in a single directory. In this instance, determining which files have been modified, and the order of compilation is a relatively straightforward process.
The aspect of the compilation process that determines which files to process and their order is called dependency analysis. Dependency analysis determines the causal relationship between two independent objects. An object depends on another object where a change in the latter requires a change in the former in order to bring each object into the current state. The instruction which effects the change in the object is called the rule associated with this dependency. All dependent relations have rules which express how to bring the dependent object up to date with the object on which it depends. In the software context, an object file such as "file1.object" is dependent on a source code file "file1.source" where where "file1.source" must be compiled to create "file1.object." The instruction to "compile" "file1.source" to produce "file1.object" is the rule associated with this dependency. Any change in "file1.source" requires the execution of the rule to bring "file1.object" into the currency with "file1.source." We can express this relationship diagrammatically as: ##STR1##
Most software development systems incorporate some method for determining these dependencies for small sets of source code files residing in relatively few directories, and work with sufficient efficiency. However, in the creation of large scale software projects, such as computer operating systems, the various source code files can reside in hundreds of directories organized into a complicated hierarchical structure, where each directory contains only a fragment of the global hierarchy. In this case, the dependencies becomes increasingly complex, because a single file can be both dependent on numerous files in various directories, and one of the dependents of a file in other directories. These inter-directory relationships are not defined in any single directory and thus there is no global description of the system which can be used to optimally sequence the compilation process. Traditional approaches to expressing the dependencies for an entire system are often unable to determine the dependencies of all the files involved at all, or in an acceptable amount of time. This results in inefficient iterative compilations of various fragments of a software system: certain files are compiled unnecessarily, repeatedly, or both. This substantially increases compilation times. What is needed is a system for determining the minimal set of dependencies for files residing in numerous directories, and for specifying their compilation sequence in order to avoid unnecessary or duplicative compilations.
In accordance with the present invention, a tree abstraction apparatus and method are described for determining the dependencies of source code files residing in multiple directories, and for determining the optimal compilation order.
Tree abstraction identifies the "by-products" of a set of dependencies and then expresses the top-level dependency in terms of the bottom-level inputs on which it depends. The by-products are the intermediate dependencies between the bottom level inputs (or "leaf dependencies," which depend on nothing and are thus assumed to be always up to date) and the top level output. By expressing the top-level dependency only in terms of up-to-date leaf dependencies, one automatically ensures that all intermediate dependencies will also be kept up to date.
The basic goal of tree abstraction is to filter out as much of the dependency tree as possible by taking advantage of two standard conventions used when maintaining separate trees within a system. First, there are a small number of top-level targets that are defined in each tree that will produce "the system" when they are all made up-to-date. In the software context, this means that there are relatively few files which are the ultimate output of the compilation of many files. Second, the intermediate targets (by-products) of some trees are the leaf dependencies of others. That means that the relationships between trees can be determined given only the list of top-level targets, by-products, and leaf dependencies of each tree. In the software context, this means that the global hierarchical relations between the directories can be expressed in terms of the main output files for the directories, the intermediate input files, and the bottom level source code or other input files which must be compiled to create the software program.
The other important aspect is that from the higher level point of view, the by-products depend on their top-level target, which reverses their real causal relationship, but not their logical relationship. That is because the top-level targets are the only interface to the individual trees, and a by-product can only be brought up-to-date by processing its corresponding top-level target. The abstracted tree can determine whether the top-level target is up-to-date because it directly depends on all its leaf dependencies. If anything at all is out of date, the individual trees must be processed to determine the exact commands to run.
Extracting by-products enables the determination of the minimal set of updating actions required to bring the top-level target up to date. When applied over a large number of dependencies this can result in a significant savings in computing time. In addition, abstraction allows one to express the minimal set of dependencies between two or more trees. If each tree represents one source code directory, applying the abstraction to a set of trees will determine the order in which the compilations should be run in each of the directories and will also identify any directories which are up to date and do not need to be processed.
Tree abstraction therefore provides a means of efficiently calculating the minimal set of compilations which need to be run in order to bring a specified target up to date. Without this ability, it would be necessary to iterate through all directories, compiling in each one. In fact, one would have to perform this iteration multiple times, since compiling in one directory might make a previous directory out of date. The tree abstraction method puts a bound on the problem, and also assures minimal processing time.
In summary, the tree abstraction method extracts from a series of incomplete local hierarchical directories the critical inputs and intermediate inputs for a given set of main outputs from each directory, and creates a minimal description of the global hierarchy that expresses the logical dependent relations between the directories. This global description is a minimal description because it expresses only the logical inter-directory dependency relations; it is optimized because establishes the order in which the objects in various directories must be processed in order to update the entire system in a single pass, without repeated or unnecessary processing.
The apparatus for executing the preferred embodiment of the method of the present invention contains a processor, a keyboard, a display terminal and a plurality of uniquely configured registers and data storage devices connected to the processor along a common data bus. These memory registers include: a directory file register which contains the names of the directories to be processed; a directory description register which contains the hierarchical relations in each directory to be processed; a main output file register containing the names of the main output files for a given directory; a by-products register, which contains the names of the files identified as intermediate input files for a given main output file of a directory; a leaf dependency register which contains the names of the files determined to be the primary inputs for a given main output file of a directory; an abstracted tree register for maintaining the accumulated description of the directories as they are processed; and a match register for temporarily holding to items for comparison.