1. Field of the Invention
The present invention generally relates to the art of digital computers, and more specifically to an Application Binary Interface (ABI), a structure and method of interfacing a binary application program to a digital computer system.
2. Description of the Related Art
An application binary interface includes linkage structures by which a user written and compiled binary application program can interface with a specific digital computer and operating system. In order to set out the background of the present invention, for illustrative purposes, the following description relates to a Motorola PowerPC 601 Reduced Instruction Set Computer (RISC) microprocessor running under the UNIX System V, Release 4, operating system (PowerPC is a trademark of International Business Machines Corporation, Unix is a registered trademark of UNIX System Laboratories, Inc.).
The disclosed arrangement is also applicable to the Solaris operating system running on the PowerPC microprocessor (Solaris is a trademark of Sun Microsystems, Inc.). It will be understood, however, that the present invention is applicable to any computer architecture having suitable characteristics and is not limited to any specific computer or combination of processor and operating system.
In recent years, an application binary interface has been developed for the UNIX System V operating system for most commercially available microprocessors. These interfaces include generic sections which are applicable to all processor architectures, and other sections which are specific to each processor.
Several aspects of the PowerPC architecture have not lent themselves well to the conventional solutions utilized in the prior art relating to other processors. These aspects include the following:
1. Acquisition of Global Offset Table (GOT) Pointer
Machine code instructions in powerful microprocessors such as the PowerPC RISC instruction set are generally of two types, position-dependent and position-independent. Position-dependent instructions can include absolute addresses. To execute properly, a module containing position dependent machine code must be loaded at a specific virtual address, in order to make the program's absolute addresses coincide with the process's virtual addresses.
Position-independent (also called relocatable code) instructions typically include relative addresses, but not absolute addresses. Consequently, the code is not fixed to a specific load address. This allows a position-independent code module to execute properly at various positions in virtual memory.
When a process image is determined by the system, the executable file containing the main program portion of the process (position-dependent) may have fixed addresses, and the system chooses object library virtual addresses to avoid conflicts with other segments in the process.
To maximize text sharing, shared objects conventionally use position-independent code. Shared object text segments can be loaded at various virtual addresses without having to change the segment images. Thus, multiple processes can share a single shared object text segment, even if the segment resides at a different virtual address in each process.
Instructions that reference memory typically require a base address in a general purpose register, and an offset or displacement field in the instruction or an index value in a second general purpose register. On the PowerPC, the offset is a signed 16-bit quantity. Therefore, absolute addressing of the entire virtual address space (which is typically 32 bits) is not possible. Relative branching is also limited to a range of .+-.32 megabytes from a branch instruction by the limited offsets in these instructions.
Since an address must be loaded into a register to perform any type of memory access in the PowerPC architecture, a Global Offset Table (GOT) is provided in each position-independent shared object module in the process image. The GOT contains addresses of global data such as constants and variables that are identified by symbols and are located outside the module.
The global offset table stores the absolute virtual addresses of these data, and data within it is referenced by adding the absolute base addresses of the global offset table (GOT pointer) and the index or relative offset of the data into the table. This method enables the module to load the absolute address of a data item into appropriate registers and read or write the data from.memory using a conventional RISC relative address read or write instruction.
In order for a module to access its global offset table, it is necessary to know the absolute. base address of the table. However, PowerPC instructions cannot contain absolute code, and must access memory through registers as described above.
The prior art method for loading the GOT uses the equivalent of a branch and link to the next instruction followed by moving the saved address to a register, followed by adding the difference between the address of the GOT and the address of the instruction to the register. On the PowerPC, this requires four instructions.
2. Calls to Functions in Shared Object Modules
Much as the global offset table redirects position-independent address calculations to absolute locations, the application binary interface comprises a procedure linkage table (PLT) which redirects position-independent function calls to absolute locations. A link editor, also known as a static linker, cannot resolve execution transfers (such as function calls) from one executable or shared object module to another. Consequently, the link editor arranges to have the program transfer control to entries in the procedure linkage table.
At run time, the dynamic linker determines the destination's absolute address and modifies the procedure linkage table's memory image accordingly. The dynamic linker can thus redirect the entries without compromising the position-independence and shareability of the program's text. Position-dependent executable files and position-independent shared object files have separate procedure linkage tables.
Modifying an entry in a conventional procedure linkage table involves changing more than one instruction. This must be done in a specific order, and constitutes a non-atomic operation. If one instruction is changed but not the others, the instruction sequence becomes invalid. This can occur if the dynamic linker is modifying the entry, and a call from another processor or asynchronous event handling code is made through the same entry. This situation is known in the art as re-entrancy.
3. Variable Argument List Function Calls
Variable argument list functions are generally designated as "varargs" functions in the C programming language. The prime example of a variable argument list function is "printf", which causes specified data to be output to a monitor screen, printer or the like.
A calling process can pass a variable number of arguments of different types to a varargs function, the arguments being broadly classifiable as "floating point" and "non-floating point". The PowerPC microprocessor comprises a large number, (more specifically thirty-two), of 64-bit floating point resisters FPR in addition to the floating-point status and control register FPSCR.
In the prior art, a calling function does not tell the varargs functions whether or not floating point arguments are passed. Therefore, the varargs function itself must save the floating point argument registers because floating point arguments might have been passed. This operation constitutes a waste of operating time if the varargs function does not actually pass floating point arguments.
Also in prior art, a program will incidently acquire a floating point state merely because it calls a varargs function, even though it never "constructively" uses floating point. In such a case, the operating system must store the entire floating point state (all of the floating point registers) when operation is switched to another task, and restore the floating point state when it is called again. This operation also constitutes a waste of operating time if the program does not actually use floating point.
4. Removing Address Mappings for Terminated Processes
The PowerPC memory management architecture is unusual several respects. First, it has one large page table containing translations for all address spaces at once rather than having separate page tables for each address space and switching between them on context switches as in conventional designs.
Second, the page table structure itself is different from conventional designs which employ hierarchical page tables, with various portions of the virtual address being used to index into the page tables at various levels.
A virtual address space is assigned to each process that is to be run on the microprocessor, and physical addresses in the processor's memory are mapped to the virtual addresses by a memory management unit. These mappings are implemented as Page Table Entries (PTEs) in the page table.
The page table maps virtual addresses to real addresses. A virtual address consists of a Virtual Segment Identifier (VSID) concatenated with an offset within the segment, and the PTE for a given offset within a given VSID is obtained by hashing the offset/VSID and searching the indicated portion of the page table.
After processes have been terminated, their corresponding PTEs must be unmapped or removed from the page table to make room for other processes. Due to the manner in which virtual to physical address translation is performed on the PowerPC, unmapping of the PTEs for each process requires that the entire page table be searched for entries having corresponding VSIDs.
The PowerPC includes a 32 bit effective address space (addressable by the program), and a 52 bit virtual address space. Even with the extremely high processing speed of the PowerPC, unmapping of the PTEs for a single process requires a significant fraction of a second. The PTE unmapping operation, if performed in the conventional manner, constitutes unacceptably excessive overhead in the operation of the system.
5. Conclusions
There still exists a need for an application binary interface and memory mapping system that can (1) efficiently acquire the absolute address of the global offset table, (2) efficiently manage linkage to functions in shared object modules without reentrancy problems, (3) support variable argument functions without unnecessarily acquiring floating point state and avoid the unnecessary saving of floating point registers, and (4) avoid high overhead for page table entry deletion at process termination.
The present invention fills this need.