Computer technology continues to advance at a remarkable pace, with numerous improvements being made to the performance of both microprocessors—the “brains” of a computer—and the memory that stores the information processed by a computer.
In general, a microprocessor operates by executing a sequence of instructions that form a computer program. The instructions are typically stored in a memory system having a plurality of storage locations identified by unique memory addresses. The memory addresses collectively define a “memory address space,” representing the addressable range of memory addresses that can be accessed by a microprocessor.
Both the instructions forming a computer program and the data operated upon by those instructions are often stored in a memory system and retrieved as necessary by the microprocessor when executing the computer program. The speed of microprocessors, however, has increased relative to that of memory devices to the extent that retrieving instructions and data from a memory can often become a significant bottleneck on performance. To decrease this bottleneck, it is desirable to use the fastest available memory devices possible. However, both memory speed and memory capacity are typically directly related to cost, and as a result, many computer designs must balance memory speed and capacity with cost.
A predominant manner of obtaining such a balance is to use multiple “levels” of memories in a memory architecture to attempt to decrease costs with minimal impact on system performance. Often, a computer relies on a relatively large, slow and inexpensive mass storage system such as a hard disk drive or other external storage device, an intermediate main memory that uses dynamic random access memory devices (DRAM's) or other volatile memory storage devices, and one or more high speed, limited capacity cache memories, or caches, implemented with static random access memory devices (SRAM's) or the like. In some instances, instructions and data are stored in separate cache memories to permit instructions and data to be accessed in parallel. One or more cache memory controllers are then used to swap the information from segments of memory addresses, often known as “cache lines”, between the various memory levels to attempt to maximize the frequency that requested memory addresses are stored in the fastest cache memory accessible by the microprocessor.
Whenever a memory access request attempts to access a memory address that is not cached in a cache memory, a “cache miss” occurs. As a result of a cache miss, the cache line for a memory address typically must be retrieved from a relatively slow, lower level memory, often with a significant performance penalty. Furthermore, typically any existing data in the cache that “maps” to the same area of the cache (typically referred to as a “cache entry”) must be written back to a lower level memory or discarded. Furthermore, if the existing data that has been removed from the cache is later needed, another cache miss will occur, resulting in that data replacing whatever data is currently stored in the same cache entry.
As a result, it is often desirable for performance reasons to minimize the frequency of cache misses that occur during operation of a computer, also referred to as maximizing the “hit rate” of a cache.
From the perspective of program instructions, one manner in which the hit rate of a cache can be impacted is based upon the organization of such program instructions in a memory address space. In particular, if different segments of a computer program are arranged in particular segments of the memory address space that map to the same entry in a cache, and those different segments are frequently executed by a computer (perhaps in parallel, or in an alternating fashion), the execution of instructions may result in a substantial number of cache misses, e.g., due to instructions from each segment having to be repeatedly swapped in and out of the cache as they are needed for execution. The resulting conflict is often referred to as a “hot spot” in the cache.
If the conflicting segments of the computer program were mapped to different cache entries, there would be a greater probability that the frequently needed instructions from both segments could reside in the cache at the same time, thus avoiding many of the cache misses that otherwise would have occurred.
As a result, simply the organization of program code segments in the memory address space can have a significant impact on system performance, particularly for frequently-executed program code. Of note, in many instances, the precise ordering of segments of program code in a memory address space is not relevant to the functions provided by the program code. Put another way, the program code segments in many types of computer programs would operate in the same manner from a functional standpoint regardless of their locations in memory relative to one another.
Developers have attempted to take advantage of this flexibility by attempting to organize program code segments in memory in such a manner that hot spots are minimized wherever possible. Empirical testing or simulation may be performed on different arrangements of program code to attempt to find an optimal solution having the highest hit rate, and thus least amount of cache misses, during operation of a computer upon which the program code is installed.
One drawback to conventional testing and simulation, however, is that in many instances the number of potential arrangements of program code can be too numerous to effectively analyze all possible arrangements. For example, for some computer program code, such as the program code in a complex operating system, the number of segments, or modules, of program code for which it may be desirable to arrange so as to address memory access-related performance degradation issues can be in the hundreds of modules. Even 100 modules provide 100! or 9.3×10157 possible orderings of modules, so as the number of modules increases, the number of possible orderings quickly becomes impractical to analyze, even with the fastest of computers.
Therefore, a significant need exists in the art for an improved manner of testing and selecting optimal orderings of program code segments to minimize memory access-related performance issues due to cache hot spots and the like resulting from execution of such program code segments in a computer implementing a multi-level memory architecture.