Large software systems are commonly divided into components called modules. These modules, usually organized into files called object files (one module per file), must be tied to one another to form executable files before they operate. Before a program can be loaded into an end-user's computer for use it must be compiled, that is, converted from source code format in which the programmer typically writes the program to object code format which will be recognized by the end-user's computer. It must also be linked, which includes computing proper addresses of all modules which comprise the compiled program. The output of compiling is usually one or more relocatable object files which are combined during linking into a single executable program. Linking cannot be completed until all modules (object files) are fully defined.
Like most programs, an object file comprises lines of code which instruct the computer to carry out specific functions. Quite often an instruction within the object file will make reference to another instruction within the same or another object file. For example, an instruction might tell the computer to go to a particular address in the object file (e.g., line 4) and perform a function.
A problem arises when two or more object files are linked. Since each object file is created independently of the other object files, the addresses used in one file will bear no relationship to those of another. For example, if an object file has a starting address of 50, the code addresses for that file will progress upward in sequence (e.g., 51, 52, 53, etc.). When several object files are linked to form an executable file, each line of code in the object files that make up the executable file are assigned new addresses within the executable file. Thus, when a line of code from the object file tells the computer to go to line 52 and perform an operation, line 52 of the executable file may not be the desired line.
To solve this problem, relocatable object files are used. A relocatable object file is assigned addresses relative to memory location zero. Thus, since the first memory location is a known constant, the location of each line of code in a particular object file can be calculated and the instruction that directs the computer to a particular line can be adjusted accordingly to reflect the location of the line within the executable file in which the object file is used. This allows a programmer to code sections of programs without being concerned about the final arrangement of the code when it is linked to form the executable file.
An assembler is used to generate a relocatable object file, and the linker then combines the data from the object files. If an instruction or data item within an object file makes reference to another instruction or data item with the same or another object file, this reference may have to be updated. This is traditionally accomplished through the use of relocation entries.
Each relocation entry consists of three or four fields. The fields identify what data is going to have to be updated, identify a symbol that will eventually point to the correct address for updated data, define how to change the data to be relocated, and identify the offset value (the amount to add or subtract from the data to be relocated) if needed. Relocation entries are capable of being "extended" so that the files in which the relocation entries reside can be customized to function with a particular piece of hardware (e.g., a Pentium processor).
It is desirable to run digital signal processors (DSP's) and other microprocessors typically used in embedded applications as fast as possible; thus, it is preferable to reduce the number of operations that are required during the execution stage (run time) of operation. To achieve this, as many operations as possible should be performed during "build time", which is the compile time and/or link-time, as opposed to the run time, since timing is not as critical during the building of the application as it is during run time.
Arbitrarily-complex expressions are algebraic expressions that have no artificial limits on nesting depth or the use of particular operators or data items. In software systems, arbitrarily-complex expressions involving relocatable labels can only be resolved at link-time or at run-time because the final values of those labels are not known until the object files are linked together. Generally, expressions in source or object files that involve symbolic labels and that are resolved at link time are limited to simple expressions (e.g., a label plus an immediate value), if such expressions are able to be resolved at all. Linking is the last step in the creation (building) of an executable program and is normally performed by a tool called a linker or loader. Since most object file formats are not capable of representing arbitrarily-complex expressions as relocation values, current development tool sets usually require that arbitrarily-complex expressions be independently specified to the linker through a linker control file, if they are to be performed at link-time.
A linker control file is an additional file that independently instructs the linker how to compute the arbitrarily-complex expressions. Relocation entries in the object file produced by the assembler instruct the linker how to use the results of the computation. Using linker control files requires additional user intervention to specify the arbitrarily-complex expression when linking. Whereas object files are deliberately written so that they can be used repeatedly for different functions, the linker control file is an extra file that has to be written, debugged and edited each time a group of object files are linked. Additional symbols are required to support the linker control file expressions. Further, the linker control files cannot be used within object files that are stored as modules within library archives, and the linker control files cannot be used in dynamically linked libraries (DLL's), since both contain only module data and the values in the linker control file will change with each application. Use of linker control files also makes programs more error-prone, since they require the programmer to write the linker control file in order to create "dummy" symbols. The programmer also has to modify the source program to make reference to the dummy symbols. The linker has to obtain information from both the linker control file and the relocation entry. This places limitation on the ability of the linker to, determine if there are any conflicts between the obtained information and other information in the program (e.g., multiple uses of the same symbols or definitions of a symbol in an object file instead of in a linker control file).
If arbitrarily-complex expressions could be built into the object file during build time, then there would be better performance during run-time. Conventional relocation entries are incapable of representing such expressions, however. Thus, if such expressions can be specified at all, they must be explicitly specified to the linker through some other means, such as the above-mentioned linker control file method.
Specifying arbitrarily-complex expressions directly in the object files would make building programs easier and less error-prone, and would allow libraries of object files to be self-contained and include any expressions that they use in the object file itself. It would also allow expressions to be used in conjunction with deferred (e.g., dynamic) linking, such as using a dynamically linked library (DLL) whose final address is not known until just before it is used.
Previous attempts to encode expressions into traditional object files have revolved around using strings of characters to represent expressions. This requires reserving a place in the object file to store the strings,and the linker must parse and reinterpret the strings in order to resolve them. This is a time consuming operation which may introduce errors, since the linker is reinterpreting the expression, including symbol names which may not be unique. Conventional standard object file formats do not support such expression strings, which leads to non-standard variants.