1. Field of the Invention
The invention relates to the loading of software into memory for execution by a computer system, and more particularly, to techniques for updating relocatable address pointers in such software to reflect the actual memory addresses at which the designated symbols are loaded.
2. Description of Related Art
Computer programs are typically written originally in source code in a computer language such as C or Pascal, or in an assembly language. To prepare the program for execution on a computer system, one or more source code modules are passed through a compiler (or assembler) which is specific to the language used and the computer system on which it will be executed, and which generates an object code file as its output. A linker routine, which is either a separate program or is part of the compiler, combines them into a single output file, known as an "executable" object code file. One or more executables are then loaded together into memory by a loader program, and control is transferred to a start address to initiate program execution.
An executable object code file typically includes, among other things, a header section which contains information about the structure of the file; one or more code sections which contains binary instructions which are directly executable by the system's CPU; one or more data sections; and a loader section, the contents of which are described below.
A data section typically contains data which was initialized by the compiler in response to the source code, descriptors describing various procedure pointers, as well as several other types of pointers. The various pointers which are contained in the data section may include some which refer to the address in memory of other data objects or of specific computer instructions. For example, a pointer may refer to specific objects in a code section, such as the entry point of a procedure. Other pointers in the data section may contain the addresses of other objects in the same data section. (As used herein, an address may be real or virtual, depending on the computer system used). Further, in systems where programs may be compiled into two or more executable files and subsequently loaded together, a data section in one file may contain pointers to objects in a code or data section of another file.
All of these references to absolute addresses must be "relocatable" since at the time of compilation and linking, the compiler/linker has no way of knowing what will be the ultimate addresses in memory at which the various referenced objects will be loaded. Thus references in an executable object code file to an address in a code section are often represented in a form which is merely relative to the start of the code section, and references to an object in a data section are represented in a form which is merely relative to the starting address of the data section. The loader program is then able to perform a relocation of these references by, after a referenced section is loaded into memory and the start address of that section is known, merely adding that start address to all of the references to objects within that section.
References to external symbols are typically represented in an executable object code file as indices into a symbol import table which is also contained in the file, each entry in the import table identifying both the name of one of the symbols and the external file which should contain that symbol. The indices are often numbered consecutively. When the loader program encounters a reference to an external symbol, it loads the external file and determines the address of the referenced symbol. The loader program then relocates the reference by adding in the address of the referenced symbol.
The loader section of an executable object code file typically includes a relocation table containing entries which specify how each relocatable reference is to be relocated upon loading into memory. For example, for a relocatable reference to an object which is within a code section, the relocation table contains a specification that the number to be added to the reference is the start address of the code section, rather than the start address of some other section. Similarly, for a relocatable reference to an object which is contained within a data section, the relocation table contains an entry specifying that the number to be added to the relocatable reference is the start address of the data section rather than of a code section. For a relocatable reference to an external symbol, the relocation table contains a corresponding specification of the index to the desired entry in the symbol import table.
When the loader program begins operation, it retrieves the desired executable object code file from a mass storage device such as a disk. If the computer system permits multiple tasks to be resident simultaneously using a shared object code section, then a separate copy of the data section(s) is (are) made for each task which will use the loaded file. In one example of memory organization, the loader may first check whether the desired file is already present in memory for another task. If not, the loader loads the header, code, data and loader sections of the file into a portion of memory which is read-only to individual users. In either case, the loader then makes a copy of the data section(s) in read/write memory for the new task.
If the loader has been invoked to load several files or modules into memory at the same time, then these files, too, are loaded into memory in the same manner as the first file. All the references to external symbols are resolved at this time, by inserting into each file's symbol import table the address into which each symbol has been loaded. Symbol imports may be resolved recursively. That is, when one module (e.g. an application program) references a symbol in a second module (e.g. a library), the loader may load and perform all relocations on the second module before returning to resolve the symbol in the first module. Similarly, if the second module references a symbol in a third module, the loader may load and perform all relocations in the third module before returning to the second, and so on.
After the various sections of file have been loaded into memory, and imports have been resolved, the loader performs the relocation process. The relocation process is performed by traversing the relocation table in the loader section, and performing the specified relocation operation for each of the relocatable references contained within the current file.
One popular format for executable object code files is known as XCOFF. XCOFF is described in the following articles published in IBM, "R6000 InfoExplorer" (CD-ROM, 1992):
"a.out File Format", PA1 "Optional Auxiliary Header for the a.out File", PA1 "Section Headers for the a.out File", PA1 "Raw Data Sections for the a.out File", PA1 "Special Data Sections for the a.out File", PA1 "Relocation Information for the a.out File", PA1 "xcoff.h", PA1 "filehdr.h", PA1 "reloc.h", PA1 "scnhdr.h", PA1 "loader.h";
all incorporated herein by reference. In XCOFF, each entry in the relocation table is 12 bytes long and contains the following fields:
TABLE I Length Field Name (Bytes) Description 1_vaddr 4 Offset within section number specified in 1_rsecnm, of an information item to be relocated. 1_symndx 4 External symbol import table index of object that is being referenced. 1_rtype 2 Type of relocation. 1_rsecnm 2 Number of the section containing the relocatable item governed by this table entry.
The l_symndx field of a relocation table entry specifies whether the item to be relocated is a reference to an external symbol, or to an object in one of the code or data sections. Specifically, values of 1 and 2 indicate that the reference is to an object in a .data or .bss section respectively (both of which are considered "data sections" as the term is used herein), and a value of 0 indicates that the reference is to an object in a .text section (code). Values of 3 or higher constitute indices into the external symbol import table for the file, and indicate that the relocatable reference in the information item is a reference to the corresponding external symbol. In this case, the relocatable reference in the information item itself may contain 0, or an offset value to which the address of the external symbol will be added. Note that while relocation table entries have the capacity to control relocations of information items contained in the code section, this capacity is rarely used on computer systems which support relative branching. For these systems, when a compiler generates a branch instruction for the code section, it typically uses the relative branch format so as to obviate any need for a relocation. When the compiler generates an instruction which references a data object, it typically uses an indexed addressing mechanism for which only the offset from the base address of the data section need be included in the ultimately executed code. The software pre-loads the starting address of the desired data section into a register to use as the base address.
Further, in the situation where code sections are sharable, relocations are avoided in the code section also because the relocation appropriate for one task may not be the same as the relocation appropriate for another task sharing the same code section.
The l_rtype field indicates the type of relocation which is to be performed, and most commonly contains a value indicating that the reference is an absolute 32-bit reference to the virtual address of the object.
l_rsecnm indicates the section number containing the information item to be relocated. As with the l_symndx field, certain predefined values are implicit references to the .text, .data and .bss sections, respectively.
The XCOFF file format is extremely inefficient in terms of space occupied in the mass storage device, in terms of memory usage at launch time, and in terms of the time required to launch an application. The space which an XCOFF file occupies in mass storage is in large part due to the fact that XCOFF requires 12 bytes of relocation information for each 4-byte word in the data section that requires relocation. Thus in an executable object code file containing 1.5 megabytes, as much as 300 k bytes might be occupied by the relocation table. The relocation table space overhead is also a large factor in the inefficient usage of memory at launch time. The inefficiency of launch time speed performance is due in part to the need to retrieve and interpret 12 bytes for every relocation to be performed.
Another conventional format for executable object code files is used in the GEM disk operating system for Atari ST computers. See K. Peel, "The Concise Atari ST 68000 Programmer's Reference Guide" (Glentop Publishers:
1986), especially pp. 2-21 through 2-24. The entire Peel guide is incorporated herein by reference.
In the GEM format, the loader section of an executable object code file consists of a series of bytes, each of which specifies at most a single relocation. A loader routine maintains a pointer into the program being loaded, and updates the pointer in dependence upon each byte in the loader section. Specifically, if a byte in the loader section contains any number between 2-255 inclusive, the loader routine advances the pointer by the specified number of bytes and adds the start program address to the 32-bit relocatable reference then pointed to by the pointer. If the byte in the loader section contains the value 1, then the loader routine advances the pointer by 254 bytes without performing a relocation. A zero byte in the loader section indicates the end of relocations.
The GEM executable object code file format and loader routine are extremely primitive, lacking any capability for symbol imports and exports, for separate code and data sections, or for any kind of relocation other than the addition of the start program address to a 32-bit relocatable reference. Additionally, like the XCOFF format, the GEM format still contains a relocation table entry (byte) for each relocation to be performed.