1. Field of the Invention
This invention is related to the field of processors, and more particularly, to probing a trace cache within a processor.
2. Description of the Related Art
Instructions processed in a processor are encoded as a sequence of ones and zeros. For some processor architectures, instructions may be encoded with a fixed length, such as a certain number of bytes. For other architectures, such as the x86 architecture, the length of instructions may vary. The x86 processor architecture specifies a variable length instruction set (i.e., an instruction set in which various instructions are each specified by differing numbers of bytes). For example, the 80386 and later versions of x86 processors employ between 1 and 15 bytes to specify a particular instruction. Instructions have an opcode, which may be 1–2 bytes, and additional bytes may be added to specify addressing modes, operands, and additional details regarding the instruction to be executed.
In some processor architectures, each instruction may be decoded into one or more simpler operations prior to execution. Decoding an instruction may also involve accessing a register renaming map in order to determine the physical register to which each logical register in the instruction maps and/or to allocate a physical register to store the result of the instruction.
Instructions may be fetched into the decode portion of a processor based, in part, on branch predictions made within the processor. In general, the bandwidth of the instruction fetch and decode portions of a processor may determine whether the execution cores are fully utilized during each execution cycle. Accordingly, it is desirable to be able to provide enough bandwidth in the instruction fetch and decode portions of the processor to keep the execution core as fully supplied with work as possible.
Most processors employ one or more cache memories for storing frequently or recently used information. Typical caches, such as an L1 or L2 cache, for example, may be organized as a collection of blocks of memory that are referred to as cache lines. Cache lines may be easily stored and accessed since they are aligned, contiguous blocks of memory. Generally speaking, when a cache line must be invalidated, it may be a simple process of comparing a probe address to the physical address in the cache tags of all cache lines at indices that could be holding the probe's data. The list of cache indices simply comes from the probe address bits that correspond to the cache index bits.
Later generation processors typically use some form of trace cache for caching instructions that have been decoded into operations that are commonly referred to as micro-ops. Trace caches may store streams of decoded instructions or ‘traces’. There is generally no requirement that these instructions be sequential and the first instruction in the trace is not necessarily aligned on any particular boundary. Thus, it may be problematic to invalidate trace cache entries corresponding to a given probe address.