Processors are known which execute very long instruction word (VLIW) instructions. Generally, VLIW instructions are variable length and are composed of syllables. Each instruction is termed a bundle, and in some examples a bundle can consist of one, two, three or four syllables. A VLIW processor executes VLIW instructions (bundles) on every cycle where it is not stalled. FIG. 1 illustrates a Prior Art layout in a memory 2 of such instructions. It will be appreciated that FIG. 1 shows only a very small part of a memory 2 and in particular shows only the rows within one sector of the memory. In this document, “rows” are used to define a region of memory relating to the issue width of the processor. In the examples discussed herein a row is a 128 bit aligned region of memory. In a VLIW memory as exemplified herein, bundles are aligned to 32 bit boundaries in memory. Therefore, a maximum width (128 bit) bundle has only a one in four chance of being aligned to a 128 bit boundary. In most cases it will not be 128 bit aligned. FIG. 1 shows the case where an instruction I1 is 128 bit aligned beginning at row address addri and a situation where another instruction, I2 is misaligned, commencing at a row address addr[j+64] and with its last syllable at address addr[k+64]. In this case, the address addrj would represent the 128 bit aligned address for the memory 2.
In order therefore to allow bundles to be assembled, existing instruction caches for use with such a memory 2 are constructed to allow four syllable reads to be made from arbitrary 32 bit aligned addresses. A direct mapped cache permitting this is shown in FIG. 2. FIG. 2 illustrates a Prior Art cache 4 having four banks B0, B1, B2, B3. In this example, each bank has a capacity of 8 kilobytes and is 32 bits wide. The cache 4 is connected to an execution unit 6 which comprises a plurality of execution pipelines or lanes L0, L1, L2, L3. Each lane accepts a 32 bit wide syllable from the respective bank of the cache.
In order to allow for non-aligned addresses, each bank comprises an individually addressable RAM. The RAMs are connected to address circuitry 8 via respective address lines ADDR1 . . . ADDR4. Each RAM is addressed by supplying a row address along the respective address line. In the case of instruction I1, it can be seen that the row address for row i can be fed to each of the banks. However, for instruction I2, banks B0 and B1 need to be addressed by the address for row j, whereas banks B2 and B3 need to be addressed by the address for row k. This is shown in more detail in FIG. 3 which illustrates where the syllables (S11 . . . S42) of instructions I1 and instructions I2 are stored in the cache 4. As is well known caches are arranged in lines. In a direct mapped cache, each bank has a plurality of addressable locations, each location constituting one cache line. When a cache miss happens a full line is fetched from memory. In principle a line can be any number of bytes, but in the examples discussed herein the cache line length is 64 bytes. Each line has a tag stored with it which states where in memory the cache line came from—this stores the upper bits of the address. The lower bits of the address are used to index the cache (that is to look up the line within the cache) to access the data. Thus, each line contains syllables from main memory and the cache tag (upper bits of the address). Each line is addressable by a cache line address which is constituted by a number of bits representing the least significant bits of the address in main memory of the syllables stored at that row. It will be appreciated therefore that where there is reference here to a rowaddress, this is identified by a number of least significant bits of the address in main memory. Thus, there may be a number of rows in the memory sharing those least significant bits which would map onto any particular line of the cache. In FIG. 1, one sector of the memory is shown which has one row address addri mapping onto line i of the cache.
Direct mapped caches, while simple to construct, have serious performance limitations in some contexts because instructions can only be written into a line of the cache to which it legitimately maps. This may mean that there are unused parts of the cache, while other parts of the cache are being constantly overwritten.