The present invention relates generally to the field of processors and in particular to a processor having an instruction cache storing a fixed number of variable length instructions.
Microprocessors perform computational tasks in a wide variety of applications, including portable electronic devices. In many cases, maximizing processor performance is a major design goal, to permit additional functions and features to be implemented in portable electronic devices and other applications. Additionally, power consumption is of particular concern in portable electronic devices, which have limited battery capacity. Hence, processor designs that increase performance and reduce power consumption are desirable.
Most modern processors employ one or more instruction execution pipelines, wherein the execution of many multi-step sequential instructions is overlapped to improve overall processor performance. Capitalizing on the spatial and temporal locality properties of most programs, recently executed instructions are stored in a cache—a high-speed, usually on-chip memory—for ready access by the execution pipeline.
Many processor Instruction Set Architectures (ISA) include variable length instructions. That is, the instruction op codes read from memory do not all occupy the same amount of space. This may result from the inclusion of operands with arithmetic or logical instructions, the amalgamation of multiple operations into a Very Long Instruction Word (VLIW), or other architectural features. One disadvantage to variable length instructions is that, upon fetching instructions from an instruction cache, the processor must ascertain the boundaries of each instruction, a computational task that consumes power and reduces performance.
One approach known in the art to improving instruction cache access in the presence of variable length instructions is to “pre-decode” the instructions prior to storing them in the cache, and additionally store some instruction boundary information in the cache line along with the instructions. This reduces, but does not eliminate, the additional computational burden of ascertaining instruction boundaries that is placed on the decode task.
Also, by packing instructions into the cache in the same compact form that they are read from memory, instructions are occasionally misaligned, with part of an instruction being stored at the end of one cache line and the remainder stored at the beginning of a successive cache line. Fetching this instruction requires two cache accesses, further reducing performance and increasing power consumption, particularly as the two accesses are required each time the instruction executes.
FIG. 1 depicts a representative diagram of two lines 100, 140 of a prior art instruction cache storing variable length instructions (I1-I9). In this representative example, each cache line comprises sixteen bytes, and a 32-bit word size is assumed. Most instructions are a word width, or four bytes. Some instructions are of half-word width, comprising two bytes. A first cache line 100 and associated tag field 120 contain instructions I1 through I4, and half of instruction I5. A second cache line 140, with associated tag field 160, contains the second half of instruction I5, and instructions I6 through I9. The instruction lengths and their address are summarized in the following table:
TABLE 1Variable Length Instructions in Prior Art CacheInstructionSizeAddressAlignmentI1word0x1A0aligned on word boundaryI2word0x1A4aligned on word boundaryI3halfword0x1A8aligned on word boundaryI4word0x1AAmisaligned across word boundariesI5word0x1AEmisaligned across cache linesI6word0x1B2misaligned across word boundariesI7word0x1B6misaligned across word boundariesI8halfword0x1BAnot aligned on word boundaryI9word0x1BCaligned on word boundary
To read these instructions from the cache lines 100, 140, the processor must expend additional computational effort—at the cost of power consumption and delay—to determine the instruction boundaries. While this task may be assisted by pre-decoding the instructions and storing boundary information in or associated with the cache lines 100, 140, the additional computation is not obviated. Additionally, a fetch of instruction I5 will require two cache accesses. This dual access to fetch a misaligned instruction from the cache causes additional power consumption and processor delay.