The following is a brief description of conventional hardware, single threaded processes, multi-threaded processes, thread-local storage, and lazy relocation. The description provides the context of the present invention.
1. General Hardware Architecture
FIG. 1 is a block diagram that illustrates a conventional computer 101. The computer may include a bus 103 and a processor 105 coupled with the bus 103 for processing information. The processor 105 is also referred to as a central processing unit or CPU. The computer 101 also includes a main memory 107, such as a random access memory (RAM) or other dynamic storage device, coupled to the bus 103 for storing data and instructions to be executed by the processor 105. The computer 101 may also include a read only memory (ROM) 109 or other static storage device coupled to the bus 103 for storing static information and instructions for the processor 105. A storage device 111, such as a magnetic disk or optical disk, can also be provided and coupled to the bus 103 for storing information and instructions. The computer 101 may be also coupled, via the bus, to a number of peripheral devices 113 such as a display monitor, input devices (e.g., a key board), and other input/output devices.
The computer 101 also includes an external communication interface 115 coupled to the bus 103. The external communication interface 115 provides a two-way data communication coupling to a network link 117 that is connected to, for example, a local network. For example, the communication interface 115 may be Cable or Digital Subscriber Line (DSL) card, a modem, or a wireless network interface card to provide a data communication connection to a corresponding type of telephone line.
2. Single Threaded Processes
Conventionally, a programmer writes source code in a particular programming language (e.g., C++ or the like). A compiler then transforms the source code into a set of object files. A link editor (also known as a linker) then creates a loadable module, consisting of an integrated set of object files. The loadable module is then loaded to the main memory and executed on a computer similar to the one illustrated in FIG. 1.
A static executable is a loadable module that includes no references to any external data or procedures. Such an executable wastes disk and memory space because all library routines are copied into the loadable module. In contrast, dynamic libraries can be used by a number of different processes, without having their contents be copied into the main executable, thereby saving disk space. Moreover, dynamic libraries containing only position-independent code do not need multiple copies of their code in memory, even when multiple different programs use them. Position-independence enables the operating system to share a copy of the code in physical memory among many processes. For this reason, such libraries are also known as shared libraries.
Position-independent modules use a relative addressing scheme when possible and an indirect addressing scheme to access through the Global Offset Table and Procedure Linkage Table otherwise, as known in the art. Position-dependent modules contain position-dependent instructions, which use absolute addresses in a virtual memory space. Position-dependent instructions are suitable for executables because they are generally expected to be loaded into a predetermined portion of the virtual memory. Dynamic libraries, on the other hand, should not assume a specific load location because their load locations may overlap with those of the executable or of other dynamic libraries. They must instead be loadable at arbitrary addresses. For this reason and to enable sharing, position-independent code for dynamic libraries is preferred.
Unlike code, data can and often is modified by processes. In order to give each program the illusion of running on its own, the operating system gives each process a separate copy of the data from the main executable and dynamic libraries.
3. Multi-Threaded Processes
The need for having multiple independent tasks running concurrently, while sharing global data, led to threads. A thread is one of potentially many instances of execution that operate within the context of a single process. All threads that belong to a process use the same address space, e.g., sharing the same global variables, the same set of open files, memory mappings, and etc.
Multi-threaded programming enables simpler modeling of applications with multiple partially-independent activities, but it comes at a cost because access to global variable is guarded by synchronization primitives that guarantee consistency when multiple threads attempt to access the global variables simultaneously. Synchronization is not only relatively expensive, when compared with access in the absence of synchronization, but also sometimes difficult to implement correctly.
Using thread-local (TL) variables avoids the costs and traps of synchronization. The TL variables can be used when sharing is not desirable. In other words, the use of TL variables brings the separation of data between processes to the multi-threaded programming model. When a variable is marked as thread-local, its value is not shared with other threads; instead, a distinct copy of the variable is created for each thread. This technique isolates, for example, error conditions, that have traditionally been stored in global variables (e.g., errno). In this example, if all threads shared this variable as a global variable, one thread might report an incorrect error condition if another thread ran into that error.
4. Thread-Local Storage
Each module has a “thread-local storage” (TLS) section that contains a set of TL variables defined in it. When linking an executable (as opposed to a dynamic library), the relative location of each TL variable is assumed to be a constant value, and the value is stored as an entry of a global offset table (GOT) because the exact constant value is only going to be known at run time. Here, the term “run time” means a time period during which a process runs or the CPU is executing code instructions of the process. During execution, the main executable accesses TL variables by loading the constant values from the GOT.
For dynamic libraries, however, this relative location of a TL variable may vary across different threads. Particularly for dynamic libraries loaded while the program is already running, the TLS may have to be allocated dynamically, possibly even on demand. Traditionally, for dynamic libraries, a library function named _tls_get_addr( ) is called to obtain the location of TL variables. The computations performed by this function are potentially time-consuming (at least compared with loading a constant from a table, e.g., GOT), and the presence of an explicit function call may require its callers to save registers that hold values they might need after the call, if such registers are not required to be preserved across function calls in the Application Binary Interface (ABI) specification.
Link editors commonly attempt to remove such calls by turning them into the more efficient load-constant-from-table access model, when it is possible. Unfortunately, this fails to recover all of the lost performance because the compiler already has made decisions based on the assumption that the value of certain registers could be modified by the call. Besides, the load-constant-from-table access model can only be used in main executables or in dynamic libraries that are willing to give up the ability to be loaded into a process after the process has started running.
5. Lazy Relocation
Most of the start-up time of a dynamically-linked program is spent by a dynamic loader applying relocations. The dynamic loader is the module that loads into memory and relocates all other modules. Relocating a module means, for each relocation (i.e., a reference to a symbol that needs to be resolved) present in the relocation table of a loaded module: a) determining in which module the referenced symbol is defined; b) computing a value based on the type of the relocation and the location of the symbol; and c) storing the value in a memory location determined by the relocation table entry.
Several techniques to avoid the need for dynamic relocations and to reduce the cost of performing dynamic relocations are known. Some conventional techniques include: a) the use of “COPY” relocations in executables; b) the use of relative addressing modes; c) the use of “RELATIVE” relocations for local symbols; d) forcing references to symbols to be resolved locally instead of enabling them to be overridden by the dynamic loader; and e) “lazy” binding of function addresses (also referred to as “lazy” relocation).
In one lazy relocation technique, the dynamic loader places the address of a “resolver” function into a GOT entry intended to hold the address of another function that the program may call during its execution. The address of the resolver is readily available to the dynamic loader, but the address of the function would be computationally expensive to determine. When the program first attempts to call the function using the address in the GOT entry, it will call the resolver instead. The resolver will only then proceed to determine the actual address of the function and store it in the corresponding GOT entry, such that subsequent calls go straight to the intended function.
In most implementations, the address initially stored in the GOT entry is not that of the resolver, but rather that of a PLT entry. This entry calls the resolver and passes additional information the resolver needs to determine which function it needs to resolve. The additional information is usually the address of the relocation table entry that determines how to compute the value to be stored in the GOT entry.
Conventionally, although calls to _tls_get_addr( ) may take advantage of the lazy binding technique, the information typically passed to it is obtained from the result of dynamic relocations that cannot be applied using the lazy binding technique. Embodiments of the present invention enable, among others features, such dynamic relocations to be applied using the lazy binding technique, thereby avoiding the cost of applying dynamic relocations for TL variables that are not referenced during the run time.