1. Field of Invention
This invention relates to linkers and, more particularly, to methods and apparatus for resolving symbol references in multi-core architectures.
2. Background of the Invention
Modern software is often developed in a modular fashion, using a combination of custom code written for the particular application and generic code that may be used in many applications. Reusable modules are often packaged in libraries and distributed either in source code or object code format. In the source code, software in one module calls components of another module through symbolic references. For example, an application that performs digital signal processing might call a standard Fast Fourier Transform component of a standard module by calling the component by its function name in the source code, e.g., fft( ).
The process of building a final executable application from individual source code files involves several steps, which are usually performed by a set of programmer tools designed for that purpose. Source code files are typically compiled into object files individually, and then combined by a linker to make a single executable binary. The linker performs at least two separate functions. First, the linker must satisfy references that are undefined within a source code module. In the example above, if the source code to the digital signal processing application calls the fft( ) function, the linker must satisfy that symbolic reference by locating a suitable definition of the function in one of the other modules involved in the linking process. In effect, the linker must match the definition of a symbol to all the uses of that symbol elsewhere in the source code. If a symbol is referenced but is not defined anywhere else in the source code, the linker may signal the user with a warning or error message that it was unable to resolve the symbolic reference.
Second, the linker must resolve symbols to memory addresses. After identifying and resolving all of the required components, the linker must arrange those components within the memory space of the application. Each component is given a memory address. As in the example above, the fft( ) function could be given a memory address of 0×1000. Once all of the components are given memory addresses, the linker converts all of the symbolic references within the application into those memory addresses so the application can be executed by a CPU. In the fft( ) example, each symbolic reference to the function fft( ) could be resolved to reference the memory address 0×1000. FIG. 1 illustrates the process of compiling source code, resolving symbolic references, and linking into an executable image file.
Linking can be static or dynamic. A static linker bundles a component together with the components it references, as well as all components referenced by those components, until all of the necessary modules are contained in a single executable. Static linking allows the developer to distribute a single binary without needing to ensure that other dependencies already exist on a target system. Static linking, in some cases, also results in performance gains. On the other hand, static linking may require more memory and disk space than dynamic linking. Dynamic linking means that the data in a library is not copied into a new executable or library at compile time, but remains in a separate file on disk. In this case, the linker only records what libraries are required when the application is compiled and the tasks of satisfying undefined references and resolving symbols to memory addresses is done when the application is executed (i.e., at runtime). Dynamic linking allows the same library to be used by multiple applications, thereby conserving disk space and potentially memory.
A computer processor typically has some memory on-chip with the processor and other memory off-chip. The on-chip memory is generally faster but more expensive memory, while the off-chip memory is cheaper, slower, and can be very large in size. These memory stores can be divided further. For example, it is common to have two levels of on-chip memory. Some models of Analog Devices, Inc.'s Blackfin processors have a memory hierarchy as depicted in FIG. 2, where L1 and L2 memory are physically located on the chip with the CPU, while L3 memory is external to the CPU. Ideally, code and data that are most often used by the CPU would be stored in the fastest L1 memory, closest to the CPU. Code and data used less often would be stored in L2 memory, and code and data used the least for a given application would be stored in L3 memory for optimal performance. By locating code and data at various memory locations, a linker can assist in optimizing performance and resource use according to these parameters.
As computers have developed to become faster and more efficient, various technologies have been developed to execute separate instructions on more than one processor simultaneously. One common approach is to connect two or more separate CPUs on a single computer motherboard, often referred to as “symmetric multiprocessing,” or SMP. Another approach is known as “multi-core,” in which two or more independent processors are combined in a single package, often on the same integrated circuit. A multi-core approach can be particularly advantageous over a multiprocessor approach where physical space is more limited, for example, in an embedded device such as a cell phone or a digital video recorder. Some Blackfin processors incorporate multiple cores in a single unit. Other chip manufacturers such as Intel and AMD also make multi-core CPUs.
In a multi-core architecture, there are often memory areas that are private to each core as well as other memory areas that are shared between the cores but still within the processor unit and not part of main memory. By keeping data in a cache close to the processor that is using it, a multi-core system can achieve better performance and more efficient use of resources. Both the private and shared memory can be accessed using a single, unified address-space. For example, a dual core system with cores A and B could have private memory space A and B respectively, as well as a shared memory space C, as illustrated in the following table:
Memory SpaceAddressPrivate/SharedA0x001–0x100Private to Core AB0x101–0x200Private to Core BC0x201–0x300Accessible to bothCore A and Core B
A graphical depiction of the relationship between the two cores A and B and three memory spaces A, B, and C is shown in FIG. 3. These memory areas could correspond to the L1, L2, and L3 memory areas discussed above. For example, both Core A and Core B might have its own L1 memory cache. The L2 memory cache could be shared between the two cores, and the L3 memory area could be outside the CPU, connected by a bus.
All three memory spaces, A, B, and C, each occupy different, nonoverlapping address ranges, so any single given address may be part of only one of the three possible memory spaces. In other words, endc<startb and endb<starta.
Since memory spaces A and B are private to each respective core, it is possible for both memory space A and memory space B to contain objects which have the same symbol name. For example, there may be two functions y and z, which are mapped to shared and private memory respectively. Only one instance of function y is needed, since the instance in shared memory space C is accessible from both Core A and Core B. Two instances of function z are needed for the function to be accessed from both cores, because Core A cannot access any object code stored in memory space B, and likewise Core B cannot access any object code stored in memory space A.
If a function in shared memory space includes an undefined reference that can be satisfied by a function in more than one private space, the linker may not have the information necessary to resolve that undefined reference to a memory address, since the same symbol appears in two private spaces. This situation is depicted in FIG. 4.
Because the address ranges for memory space A and memory space B are nonoverlapping, the address of symbol z in this example will be different depending on whether the definition of symbol z in memory space A is used or the definition of symbol z in memory space B is used. The linker must resolve the symbolic reference to a single address to successfully build the application.
There are several options for the linker to resolve symbol z. The linker could resolve symbol z to the definition in Memory Space A. If a process is running in Core B, however, it will not be able to access the memory because all memory addresses in Memory Space A are only available to Core A. Similarly, if the linker resolves symbol z to the definition in Core B, the reference will be unavailable to a process running in Core A. The linker does not have the option of resolving the reference to both addresses, because the relocation must provide only a single address to be functional. Thus, there is a need for a system to resolve references so that an application can take advantage of the performance efficiencies of a multi-core architecture where some memory is private to each core.
3. Discussion of Related Art
One potential solution to this problem is known as a trampoline function. In this situation, the linker replaces the reference to the symbol that appears in both private memories with a placeholder function that selects the proper reference at runtime. Instead of referencing symbol z directly, the linker inserts a reference to a trampolin function. The function then determines at the time it is called which core is running the process and implements a second jump to symbol z in that core. This solution does not work in all circumstances, however. It is inappropriate for references to data (rather than code) since the data is read rather than executed by the processor. Moreover, it can result in decreased performance for the final application due to the additional steps involved.
Another possible solution is a run-time context switch in which the state of a CPU is saved and replaced with another state. A context switch is often used to allow multiple processes to share a single CPU resource. This solution is also inappropriate, however. In addition to being computationally intensive, it does not solve the linking problem when the two cores are executing in parallel.
Another approach used is simply to have the programmer manually resolve the problem in the code. There are two common manual approaches. The first approach is to map the shared reference (symbol y in FIG. 4) into each private memory. Thus, the shared symbol essentially becomes a private symbol to each core. If Core A calls symbol y, the linker will resolve that symbol to the definition in Memory Space A, and in turn resolve the reference in symbol y to symbol z in Memory Space A. Likewise, a reference from Core B to symbol y will stay within private Memory Space B. This approach is undesirable, however, because it consumes extra private memory space and requires additional work by the programmer. Often, the memory space private to each core will be the most expensive and smallest. It is thus undesirable to force the programmer to use that memory space when it is not necessary, and also results in more memory usage overall by duplicating the shared reference into two private memories.
Another manual approach that can be used is to map the two private symbols into a single symbol in shared memory. This approach also fails, however, if the private symbols are supposed to behave differently in each core and thus cannot be mapped to a single symbol. It is also undesirable if the private symbols were mapped to private memory for performance reasons.