The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.
Most software applications are written by multiple developers in a modular manner. Typically, different teams of software developers write the source code of different modules of a software application. The source code from individual modules is then compiled by a compiler into object code modules. A linker then links all object code modules into a binary file. As referred to herein, a binary (or executable) file is a file that includes the set of executable instructions for a computer program, such as, for example, a software application.
Typically, an object code module includes a data section for storing the data variables that are being accessed by the instructions in that particular module. A data variable as referred to herein is any data that may be referred to or accessed by an instruction. While some compilers may order the data variables within an object code module in an optimal way, any such optimization is not very efficient because each module only holds a fraction of all data variables used by the computer program. Furthermore, since the different source code modules are compiled separately, at the compilation step a more efficient ordering of data variables, such as ordering that accounts for data variables outside of a single module, cannot be performed.
At the linking step, the linker typically has only a high level view of the object code modules being linked. The linker simply concatenates the data sections of all object code modules and assigns virtual addresses to all instructions and data variables that are stored in the resulting binary file. While the linker recognizes the different sections in each object code module, the linker usually does not go further than just figuring out which sections from which object code modules to concatenate. Specifically, the linkers typically are not capable of performing fine-grained optimizations at the individual data variable and instruction level.
The arrangement of data variables in the data section of a computer program, however, may have a significant impact on the run-time performance of the program.
For example, most modern processors (central processing units, or CPUs) provide processor caches. A processor cache, such as, for example a Level 1 (L1) or Level 2 (L2) cache, is a storage area configured for fast access by a processor where frequently executed instructions and/or frequently accessed data are kept. Typically, a processor cache is logically divided into sections, or cache lines, where each cache line corresponds to a range of cache addresses. In order to provide for more efficient operation, a processor typically loads information into the processor cache an entire cache line at a time. Instead of loading individual instructions or data variables, a processor fetches from memory and stores in the processor cache the contents of a range of memory addresses, where the contents of a particular range of memory addresses is always loaded into a particular cache line of the processor cache. When the processor needs to fetch an instruction or a data variable from memory, the processor first checks the processor cache to see if the instruction or data variable is stored there. When the desired instruction or data variable is found in the cache, a cache hit is said to occur; when the desired instruction or data variable is not found in the cache, a cache miss is said to occur. During the execution of a computer program, a high processor cache hit ratio for instructions and/or data variables results in a faster and more efficient execution; a high processor cache miss ratio on the other hand results in a slower or sub-optimal execution.
Because of the fixed relationship that exists between virtual addresses, physical addresses, and cache addresses, the virtual addresses assigned to data variables in the data section of a computer program determine to a large extent whether or not particular data variables are going to be found in the processor cache during the execution of the computer program. However, a linker that links the object modules of the computer program typically does not assign virtual addresses to the data variables in a manner that results in an optimal placement of the data variables. When the computer program is executed, such non-optimal placement of data variables usually results in processor cache misses, which impede the run-time performance of the computer program.
One approach to address this issue is providing linkers with features to order data sections based on user input (typically represented as a mapfile). In this approach, a compiler is required to fragment data into individual data sections when the compiler compiles a source code module into an object code module. The user can then instruct the linker to place these fragmented data sections in a particular manner that is defined in a mapfile. The disadvantage of this approach is that the creation of a reasonably good mapfile is an extremely tedious process. Further, it is extremely unlikely that a user would know which set of data variables cause processor cache misses. In addition, this approach cannot take into account the interactions between instructions and data variables, and how such interactions may affect cache misses in processor caches that allow caching of both instructions and data variables.