Dynamic translation systems are a relatively new field of endeavor in the computer and software art. A technique used in dynamic translation systems is the analysis of traces.
A trace is a sequence of "blocks" or "basic blocks" of machine code. A block, or basic block, is a sequence of non-branching machine instructions ending with an exit to another block. The exit instruction may be a branch, or a conditional branch, or a fall-through to the next basic block. A program consists of multiple blocks.
During program execution, a program executes a series of basic blocks in paths. A trace records the path followed in the series. For example, with reference to FIG. 1, a program fragment is illustrated in which the programs may execute along different paths. In FIG. 1, the program may execute either block 3 or block 4, depending on a conditional test in block 2. Two possible traces included in the fragment illustrated in FIG. 1 are blocks "1-2-3-5" or blocks "1-2-4-5."
A dynamic translation system monitors the execution of an application to identify the traces that the application executes, and then translates those traces into more effective code.
Of course, the dynamic translation techniques as described must take place in an environment of a computer architecture and data structure of a fixed size. Cache memory is always advantageous to expedite processing, but in a fixed environment, cache devoted to storing traces must be managed as yet another limited and potentially scarce resource.
The preferred embodiment of the invention described herein is implemented using software cache (or "code cache") rather than hardware cache. It will be appreciated that a code cache is simply a section of memory allocated to the job of being a cache, and is typically bigger than a machine cache. However, for performance reasons, a code cache must still be limited in size and therefore managed like any other resource.
The code cache may be limited by the amount of physical memory in the system, or by the amount of hardware cache available. If a code cache is large compared to the physical memory, the system performance is likely to suffer since less memory would be available to run other applications. Depending on the system, for example, if the hardware cache is small and access to physical memory is slow, the code cache should not be too much larger than the hardware cache either.
Traces stored in cache will ideally execute faster than traces that are stored in main memory. Selection of the traces to store in cache as the program executes may be according to any of numerous memory protocols specific to various architectures and operating systems. Once placed in cache, however, a cache retention and discard protocol becomes highly advantageous in order to make room for new traces to be stored in the cache as program execution continues.
Ideally, traces will be retained in cache memory according to their retention value to the execution of the program. When the program executes a trace frequently, it is advantageous to keep it in cache memory. A trace executed infrequently need not be retained in cache, and in fact should be discarded to make room for incoming traces that may potentially be used more frequently.
Various methods are known in the art for selecting items to be retained in cache and items to be discarded. One fairly common method is simply to empty the entire cache when it is full and then start over. Although easy to implement, this method ignores the value of frequently-used items. After emptying the cache, processing must repopulate the cache with these items even though they were already in cache perhaps only moments before. This method is therefore like "throwing out the baby with the bath water."
A better system would rank items stored in cache to determine which ones to keep and which ones to discard.
Ranking is used with advantage in other related fields of endeavor. For example, Operating System (OS) kernels must decide which virtual pages to keep in physical memory, and which to throw out. It is known to rank virtual pages for this purpose, typically on a "Most Recently Used" (MRU) basis. Whenever there is an access to an address in a particular virtual page, that page's rank is promoted to the top. Frequently-accessed pages thus stay near the top, and lesser-accessed ones fall to the bottom. The top of the list is generally kept in physical memory.
A related ranking system uses a "Least Recently Used" (LRU) system, where the time between accesses is monitored and forms the basis for a ranking. Those items used less frequently (i.e. time between accesses is high) get a lower rank. Although perhaps more accurately predictive, the LRU method is less favored over the MRU method because it requires a higher processing overhead to maintain.
A further ranking system known in the art simply counts the number of times an item is used. The higher the number, the higher the rank.
All of the foregoing methods of ranking are inappropriate for cache memory management in dynamic translation systems. By its nature, the behavior of dynamic translation systems may change over time. In particular, traces may vary in size, or go "dormant" for a period before becoming very active again. The "empty out and start over" method is just too inherently inefficient to account for this dynamic behavior. It simply has no mechanism to determine what is important and what is not. The MRU, LRU and "counting" methods are too inflexible to account for this dynamic behavior. For example, assume a trace in a dynamic system executes very frequently in "bursts." An MRU or LRU algorithm may assign a low rank to a trace when the trace is in its dormant phase, even though it is shortly to become "hot." A "counting" method may assign a false high ranking to a trace that is "hot" at the time but shortly never (or rarely) execute again.
There is therefore a need in the dynamic translation systems art for a trace ranking system which, via accurate predictive ranking, enables cache management to determine which traces to keep in cache and which to discard.