1. Field of the Disclosure
The present disclosure is generally directed to a processor and, more particularly, to techniques for storing instructions and related information in a memory hierarchy.
2. Description of the Related Art
A known way to increase the performance of a computer system is to include a local, high-speed memory known as a cache memory (cache). A cache usually increases system performance as there is a high probability that once a central processing unit (CPU) has accessed information (either operand data usually referred to simply as “data” or an instruction) at a particular address, there is a high probability it will access the same address in the future. The cache fetches and stores information that is located adjacent to the requested piece of information from a slower, main memory or lower-level cache. In higher performance computer systems, several caches may be placed in a memory hierarchy. The cache that is closest to the CPU, known as the upper-level or level 1 (L1) cache, is the highest-level cache in the hierarchy and is generally the fastest. Other, generally slower caches are then placed in descending order in the hierarchy starting with a secondary cache known as a level 2 (L2) cache, etc., until the lowest level cache that is connected to main memory. One well-known processor architecture includes separate caches for instructions and data at the L1 level and a combined instruction and data cache at the L2 level.
Each cache line usually includes several bytes and other information about the bytes. For example a field called a “tag” indicates the address at which the cache line is located in memory and is used to determine whether an access “hits” or “misses” in the cache. Other useful information that characterizes the instructions or data may be stored in the cache line as well, such as error correcting code (ECC) bits and in the case of instructions, bits that characterize the instructions in the respective cache line.
These instruction characterizing bits may include predecode bits. For example one popular class of microprocessors is based on the so-called x86 instruction set, which is a so-called variable length instruction set, because the length of the instruction operational codes (opcodes) can vary between one and fifteen bytes. In a superscalar implementation of an x86 microprocessor, it is desirable to determine where the instruction boundaries are in order to dispatch multiple instructions per clock cycle. However, the determination of the instruction boundaries within a group of bytes is usually a time-consuming sequential process. In general, each instruction end must be determined before the next instruction can be examined. To facilitate multiple instruction issue without delay, this type of information may be stored along with the instructions in the cache.
Another example of characterizing bits is branch prediction bits. Branch prediction bits are useful when performing speculative execution of instructions. Speculative execution involves the process of guessing whether a conditional branch will be taken. The prediction may later prove to be correct or incorrect. If the prediction is later proved to be correct, then performance is improved by immediate processing of instructions along the predicted path through the pipeline before the condition is resolved. If the prediction is later proved to be incorrect, then the pipeline must be flushed of the instructions in progress, and extra cycles are required to “catch up.” Thus, the improvement in efficiency depends on the prediction accuracy. Branch prediction bits characterize the existence of branch instructions in a group of instructions and the nature of the branch, such as unconditional (static) versus conditional (dynamic).
In general, there is a space penalty for storing characterizing bits in a multi-level cache hierarchy. It is usually desirable to make the size of the L2 cache relatively large, such as 1 megabyte (MB), and the size of the L2 cache alone can be a significant fraction of the die area of the microprocessor. Storing these additional characterizing bits in lower-level caches can cause total die size to increase significantly.
What is needed is a processor that retains the benefit of storing characterizing bits while reducing the size of lower-level caches.
The use of the same reference symbols in different drawings indicates similar or identical items.