1. Field of the Invention
This invention relates to computing systems and more particularly to addressing poor cache utilization by applying optimizations to program code.
2. Description of the Related Art
Performance of computing systems depends on both hardware and software. For example, the speed of the processor (e.g., number of instructions executed per second), number of cores, cache size, and other hardware related aspects of the computing system affect performance. Software efficiency in performing a particular task also impacts performance. Often, the interaction of hardware and software can affect performance. One aspect where software and hardware interact is in accesses to cache memory. Cache memory stores copies of data that are otherwise stored in main memory. Cache memory is much smaller than main memory, but stores those locations that are being frequently used by the processor. Thus, cache memory allows the processor to access those frequently accessed locations more quickly than if the processor had to go to main memory.
Data for cache memory is typically retrieved in cache lines of, e.g., 64 bytes of data at a time. However, not all of the 64 bytes may actually be needed. As entries in the cache become stale from non-use, they may be replaced by other memory locations that are currently being used by the processor. When bytes that are not needed are retrieved from main memory and stored in the cache, poor cache utilization can occur.
For example, when an application spends a lot of its execution time accessing only certain (not all) fields of structures, poor utilization of the data cache can frequently occur. A structure is a software construct having multiple fields, which can be of different types. An example would be a structure in which the fields of the structure represent information related to a person, such as name, age, address, and favorite websites. Poor cache utilization associated with structures can be understood by referring to the code segment shown in FIG. 1, where hot_field represents a field in a structure that is needed by the processor, but other fields in the structure, e.g., field—1 and field—2, are not needed. The poor cache utilization comes from the fact that the cache line that contains array[i].hot_field will likely contain many other fields of the structure array[i], such as field—1 and field—2, which will be brought into the cache, along with hot_field, but unlike hot_field, these other fields will eventually be evicted from the cache unused. For some applications, that can severely degrade execution time performance.
To address this poor data cache utilization problem, compilers have applied a variety of structure layout optimizations. Compilers are used to take programs written in a high level language such as C, C++, or Fortran, and the like, and translate the high level code to machine level code that is suitable for execution on a processor. Compilers may translate the high level code to an intermediate representation and then to machine code suitable for a particular instruction set architecture. Currently, compiler structure layout optimizations include “structure splitting,” which breaks up the original structure into multiple sub-structures, and places new pointers in the new parent structure as a way to access the new child structures. A common application of that optimization is to divide up the hot/cold fields as shown in FIG. 2 with the hot_fields together and the cold field separated as a new pointer in the parent field to a child structure. Thus, the structure 201 becomes the structure 203 with the hot and cold fields separated.
“Structure peeling” is similar to structure splitting, with the only exception that no new pointers are placed in the parent structure; hence, accesses to the child structures are made explicitly and directly through the new child structures. “Structure field reordering” reorders the fields inside the structure in a way the compiler deems beneficial, most often by grouping frequently accessed fields close together. “Structure instance interleaving” groups together corresponding fields in various instances of the structure. For example, in an array of structures, each array element, a[i], itself a structure, is an instance. To interleave all these instances is to group their corresponding fields together. As shown in code segment 301 in FIG. 3, each array element a[0], a[1] is a structure that includes field—1, field—2, and field—3. To interleave all these instances is to group their corresponding fields together. Segment 303 illustrates the transformation that occurs to interleave the fields together. Field—1 of all the instances are grouped together. Similarly, field—2 of all the instances are grouped together as are field—3 of all the instances and so on.
While these cache optimizations have improved cache utilization in certain cases, these optimizations result in changing the structures physically. Further improvements in compiler optimizations to improve data cache utilization associated with structures is desirable.