A known way to increase the performance of a computer system is to include a local, high-speed memory known as a cache. A cache increases system performance because there is a high probability that once the central processing unit (CPU) has accessed a data element (either operand data usually referred to simply as “data” or instruction) at a particular address, there is a high probability it will access the same address in the future. The cache fetches and stores data that is located adjacent to the requested piece of data from a slower, main memory or lower-level cache. In very high performance computer systems, several caches may be placed in a hierarchy. The cache that is closest to the CPU, known as the upper-level or “L1” cache, is the highest-level cache in the hierarchy and is generally the fastest. Other, generally slower caches are then placed in descending order in the hierarchy starting with a secondary cache known as the “L2” cache, etc., until the lowest level cache that is connected to main memory. One well-known microprocessor architecture includes separate caches for instructions and data at the L1 level and a combined instruction and data cache at the L2 level.
Each cache line includes several bytes and other information about the bytes. For example a field called a “tag” indicates the address at which the cache line is located in memory and is used to determine whether an access “hits” or “misses” in the cache. Other useful information that characterizes the instructions or data may be stored in the cache line as well, such as error correcting code (ECC) bits and in the case of instructions, bits that characterize the instructions in the respective cache line.
These instruction characterizing bits may include predecode bits. For example one popular class of microprocessors is based on the so-called x86 instruction set first implemented by the Intel Corporation of Santa Clara, Calif. The x86 instruction set is a so-called variable length instruction set, because the length of the instruction opcodes can vary between one and fifteen bytes. In a superscalar implementation of an x86 microprocessor, it is necessary to predetermine where the instruction boundaries are in order to dispatch multiple instructions per clock cycle. However the determination of the instruction boundaries within a group of bytes is a laborious sequential process. Each instruction end must be determined before the next instruction can be examined. To facilitate multiple instruction issue without delay, this type of information is conveniently stored along with the instructions in the cache.
Another example of characterizing bits is branch prediction bits. Branch prediction bits are useful when performing speculative execution of instructions. Speculative execution involves the process of guessing whether a conditional branch will be taken. The prediction may later prove to be correct or incorrect. If the prediction is later proved to be correct, then performance is improved by immediate processing of instructions along the predicted path through the pipeline before the condition is resolved. If the prediction is later proved to be incorrect, then the pipeline must be flushed of the instructions in progress, and extra cycles will be required to “catch up”. Thus, the improvement in efficiency depends on the prediction accuracy. Branch prediction bits characterize the existence of branch instructions in a group of instructions and the nature of the branch, such as unconditional versus conditional.
There is a significant space penalty when storing these extra bits in a two-level cache hierarchy. It is usually desirable to make the size of the L2 cache relatively large, such as 1 megabyte (Mbyte), and the size of the L2 cache alone can be a significant fraction of the die area of the microprocessor. Storing these additional characterizing bits in the L2 cache causes the total die size to increase significantly. What is needed is a new data processor that retains the benefit of storing the characterizing bits while reducing the size of the L2 cache.