The present invention is generally in the field of processors. More specifically, the invention is in the field of cache memories.
As is generally known, computer programs continue to increase in size. As computer programs grow in size, the memory requirements of the computer and various memory devices also increase. However, as the size of a program currently residing in the computer""s main memory gets larger, the speed at which the processor executes tasks begins to decrease. This results from the constant fetching of instructions from the main memory of the computer into the processor (also referred to as a xe2x80x9cCentral Processing Unitxe2x80x9d or xe2x80x9cCPUxe2x80x9d). The larger the program currently being used, the more often instructions must be fetched. This fetching process requires a certain number of clock phases. Therefore, the more often instructions have to be fetched from the main memory, the less time the processor has available to decode and execute those instructions and the slower the speed at which the processor can finish tasks.
Thus, it is desirable to set aside in a local memory, i.e. a memory requiring less access time than the main memory, a limited number of program instructions that the processor may want to fetch. An instruction cache is such a local memory. An instruction cache is a relatively small memory module where a limited number of program instructions may be stored.
The processor performs constant checks to determine whether instructions stored in the main memory required by the processor are already resident in the instruction cache. If they are already resident in the instruction cache, the instruction fetch step is performed by referring to the instruction cache, since there is no need to go to the main memory to find what is already in the instruction cache.
Thus, the processor must be able to determine if an instruction to be fetched from the main memory is already resident in the instruction cache. The processor""s program counter contains the address of an instruction needed by the processor. One way to determine if an instruction is already resident in the instruction cache is to keep track of the addresses of the instructions when they are first brought into the instruction cache from the main memory. To do this, copies of certain upper bits of the main memory addresses are stored in a tag memory bank where each entry in the tag memory bank is referred to as a xe2x80x9ctag.xe2x80x9d As an example, the upper 23 bits of a 32-bit main memory address comprise the tag. These upper 23 bits of the 32-bit main memory address are referred to as the xe2x80x9ctag.xe2x80x9d
When the processor wishes to determine whether a particular instruction is resident in the instruction cache, the address of the instruction is sent from the program counter across the address bus to the instruction cache and the tag memory bank. In the present example, the 23-bit tags within the tag memory bank and the 32-bit wide instructions in the instruction cache are read. The upper 23 bits of address of the instruction contained in the program counter is then compared with a tag in the tag memory. If there is a match, also referred to as a xe2x80x9chit,xe2x80x9d the instruction is already resident in the instruction cache, and it is not necessary to fetch the instruction from the main memory. If there is no match, also referred to as a xe2x80x9cmiss,xe2x80x9d the instruction must be fetched from the main memory at the address contained in the program counter.
A xe2x80x9cset-associativexe2x80x9d cache consists of multiple sets, each set consisting of an instruction cache and a tag memory bank. A set-associative cache decreases the number of instances where the program is required to return to the main memory. This is because a number of instruction caches hold instructions corresponding to a number of different segments of a computer program. Thus, the speed at which the processor executes a program increases since there is a greater chance that the processor can find a desired instruction in the set-associative cache as opposed to the main memory.
A set-associative cache also has disadvantages. Because there are multiple tag memory banks, each tag memory bank must be accessed to determine if a tag which is resident in that bank matches the corresponding upper bits contained in the program counter. In the present example, each tag memory bank must be accessed to determine whether it has a tag which matches the upper 23 bits in the program counter. Power is consumed each time a tag and an instruction are read from a tag memory bank and an instruction cache, respectively. For example, if the set-associative cache has four memory banks and four instruction caches, each time the processor accesses the set-associative cache, four instructions and four tags are read. Thereafter, at most a single tag is matched and an instruction corresponding to the matched tag is identified as the desired instruction.
In a set-associative cache discussed above, power consumed is proportional to the number of tags read, multiplied by the width of a tag in bits, plus the number of instructions read, multiplied by the width of an instruction in bits. The number of instructions and tags are, in turn, equal to the number of sets of instruction caches and tag memory banks. In the above example, the width of a tag is 23 bits, the width of an instruction is 32 bits, and there are 4 sets of instruction caches and tag memory banks. As such, the power consumption for each set-associative cache read operation is proportional to:
(4 instructionsxc3x9732 bits)+(4 tagsxc3x9723 bits).
Thus, although a set-associative cache increases the speed with which the processor executes tasks, there is a corresponding increase in power consumption resulting from the reading of the additional tags and instructions from the additional sets of instruction caches and tag memory banks. Using the example above, it can be seen that in addition to the power consumed from reading and comparing the four tags, power is consumed reading four instructions, although at most only one of the instructions will be the desired instruction.
Thus, it can be seen that there is a need in the art for a method to implement a set-associative cache which maintains the advantages discussed above, such as increased operating speed, while at the same time reducing the additional power consumption inherent in a set-associative cache.
The present invention is a low power instruction cache. According to the invention, there are a number of tag memory banks. Each tag memory bank is associated with a unique instruction cache. Each tag memory bank has a number of tag memory rows and each tag memory row has a number of tag memory cells. The invention compares certain upper bits of a program counter to a tag stored in one row of a tag memory bank. If there is a match between the certain upper bits of the program counter and the tag, a hit signal is generated. The hit signal indicates that the tag memory bank containing the matched row (i.e. the matched tag) is associated with the instruction cache having a desired instruction. The desired instruction is then read from the instruction cache associated with the tag memory bank corresponding to the generated hit signal.
Utilizing the present invention, instead of reading one instruction from each of the instruction caches and then eliminating all but one of the read instructions, only the desired instruction from a single instruction cache is read. As such, a large amount of power is saved. In one embodiment of the invention, there are four tag memory banks, each having 32 tag memory rows, and each row having 23 tag memory cells. There are also four instruction caches, each associated with one of the four tag memory banks. The upper 23 bits in the program counter is compared with each of the 23 bits in a particular tag memory row in each of the four tag memory banks. When there is a match between the upper 23 bits in the program counter and the 23 bits in a particular tag memory row, a hit signal is generated corresponding to the particular tag memory bank containing the matched tag memory row. Thereafter, a desired instruction is read only from the particular instruction cache associated with the tag memory bank corresponding to the generated hit signal.