1. Field of the Invention
The present invention relates to a data processor, and specifically to a data processor such as a microprocessor and an image processor that have an instruction cache with plural WAYs, and to a read active control method for the plural WAYs for reducing power consumption of the data processor.
2. Description of the Related Art
FIG. 1 is a block diagram of a configuration example of a conventional data processor system that has an instruction cache. In FIG. 1, a microprocessor 100 is connected to external RAM 101 that serves as external memory. The microprocessor 100 has an execution unit 102 for executing instructions, an instruction cache unit 103 for temporarily storing instruction data, a dynamic branch predictor 104 for, when an instruction to be executed is a branch instruction, outputting branch prediction data in which whether a (conditional) branch is taken or not is predicted, and a selector 105 for selecting the instruction data from the external RAM 101 or the instruction data stored in the instruction cache unit 103 and for providing the selected instruction data to the execution unit 102.
If the instruction data that is designated by an instruction address and that is requested from the execution unit 102 is not stored in the instruction cache unit 103, the corresponding instruction data is read from the external RAM 101 and provided to the execution unit 102, and the corresponding instruction data is also stored in the instruction cache unit 103.
If the execution unit 102 requests the same instruction data again, the corresponding instruction data is read from the instruction cache unit 103 and is provided to the execution unit 102 via the selector 105. In general, the access time to the instruction cache unit 103 is faster than the access time to the external RAM 101, and as a result, by having the instruction cache unit it is possible to reduce the required amount of time from the reading of instructions to execution of the instructions.
FIG. 2 is a diagram explaining a conventional example of dynamic branch prediction with the use of the dynamic branch predictor 104 of FIG. 1. The example is generally referred to as a GShare predictor. The dynamic branch predictor 104 for predicting whether a (conditional) branch is taken or not in response to a branch instruction described in FIG. 1 has a program counter 110, a branch history Register (BHR) 111, an exclusive OR operator (XOR) 112, and a pattern history table (PHT) 113 in FIG. 2. The operation of the dynamic branch predictor 104 is described in Non-Patent Document 1. Note that in Non-Patent Document 1, the PHT 113 is referred to as a counter table.
In FIG. 2, the BHR 111 shifts the execution results of the past branch instructions one after another regardless of the branch instruction addresses, and stores the execution results as a global branch history. Data with m-bits is output from the BHR 111, the XOR 112 XORs the m bit data and an n (≧m)-bit instruction address output from the program counter 110, and the resulting n-bit is used as an index to perform a search in the PHT. As described above, by using the result of the XOR operation of the n-bit output from the PC 110 and the m-bit output from the BHR 111 as an index for a search in the PHT 113, it is possible to store branch prediction data corresponding to a branch instruction in the PHT 113 in the form of an approximately one-to-one relation with the branch instruction addresses without deviation of the branch prediction data (i.e. prediction data) to some of the plural entries in the PHT 113.
The prediction data stored in each of the entries in the PHT 113 has 2 bits for each entry. In the prediction data, values change in response to the result of whether the (conditional) branch is taken or not, i.e. Taken/Not Taken, at every execution of the branch instruction. The values are equivalent to count values of the counter to which “1” is added when the branch is taken and from which “1” is subtracted when the branch is not taken.
The right half of FIG. 2 is a diagram explaining the state transition of the branch prediction data (prediction data). In Non-Patent Document 1, the branch prediction data stored in each of the entries in the PHT 113 is data that reflects the success or failure of the previous branch prediction in addition to reflecting the immediately preceding execution result of the corresponding branch instruction of whether the branch is taken or not.
A state of Strongly Taken (data 11 of the entry), for example, indicates that the previous branch prediction was successful in conjunction with the prediction of a branch instruction being taken in response to the execution result of the previous branch instruction, whereas a state of Weakly Taken (10) indicates that the previous prediction failed despite the prediction of the branch instruction being taken. In the manner described above, Strongly Not Taken (00) and Weakly Not Taken (01) indicate the respective states.
If the instruction cache unit 103 of FIG. 1 has plural cache WAYS, by predicting the WAY that stores the instruction data requested by the execution unit 102 and making the predicted WAY chip-enabled and other WAYs chip-disabled, it is possible to reduce power consumption, and as a result, a data processor having an instruction cache unit that can be operated with a low power consumption can be provided.
FIG. 3 is a configuration block diagram of an instruction cache unit that can be operated with a low power consumption and that is described in Patent Document 1 as a prior art. The operation of the unit in FIG. 3 is explained by using an example of an instruction sequence in FIG. 4. An example of an instruction sequence including branch instructions is shown in FIG. 4. Here, the operation of the instruction cache unit of FIG. 3 is explained under the assumption that four consecutive instructions are counted as one block and the instruction data corresponding to the instructions in one block is stored in only one of the plural cache WAYs.
In FIG. 3, the instruction cache unit has an instruction address register 120 for storing instruction addresses; plural (in this case two) cache RAMs 1210 and 1211 corresponding to plural cache WAYs; tag RAMs 1220 and 1221 corresponding to the two cache RAMs 1210 and 1211 respectively; two comparators 1230 and 1231 that compare outputs of the two tag RAMs with the tag address output from the instruction address register; a block head detector 124 for detecting the head instruction of a block using a block offset as a part of the instruction address; a hit/miss determination logic circuit 125 for determining whether the instruction data requested by either of the two cache RAMs 1210 or 1211 is stored (hit) or is not stored (miss) in response to the outputs of the two comparators 1230 and 1231, the output of the block head detector 124, and an entry valid signal that indicates that value data is stored in the entry designated by an address output by the tag RAMs 1220 and 1221 in the corresponding cache RAMs 1210 and 1211 and that is output from the tag RAMs 1220 and 1221; two inverters 1260 and 1261, for inverting a cache RAM ReadActive signal, which is a signal output from the hit/miss determination logic circuit 125 and that causes each of the cache RAMS 1210 and 1211 to be read active or chip enabled, and for providing the inverted signal to a chip enable (CE) terminal of a negative logic (an open circle omitted in FIG. 3); and a selector 127 for selecting and outputting instruction data output from either the cache RAM 1210 or 1211, which store the requested instruction data.
The effect of the power consumption reduction in the instruction cache unit in FIG. 3 is further explained in the following.
In a cycle in which an address of an instruction that is detected as a head of a block by the block head detector 124 is provided, the hit/miss determination logic circuit 125 causes both of the cache RAM ReadActive signals of the two cache RAMs 1210 and 1211 to be “H”, makes the two cache RAMs chip-enabled, and enables reading of the instruction data from the two cache RAMs.
In the next cycle (to which the next address of the instruction is given), the hit/miss determination logic circuit 125 leaves the cache RAM ReadActive signal to be “H” only for the cache RAM corresponding to the combination of comparator and tag RAM that has the comparator outputting “H” and the tag RAM outputting “H” as an entry validity signal, and changes the cache RAM ReadActive signal to “L” for other cache RAMs. In two cycles in which two trailing instruction addresses in the block are given, the same read active control state is maintained so that the reduction of the power consumption can be achieved. Note that if neither of the two cache RAMs 1210 nor 1211 store the requested instruction data, a cache miss signal is output from the hit/miss determination logic circuit 125, and reading of the instruction data from the external RAM 101 is performed as described above.
In Patent Document 1, however, there is a problem such that when an instruction corresponding to the instruction address output from the execution unit is a branch instruction, both of the two cache RAMs are in the chip enabled state that is same as the time of the block head instruction detection. Power consumption is not reduced at the time of the branch instruction detection, and therefore the effect of sufficient power consumption reduction is not obtained.
In Patent Document 2, which shows a conventional art for predicting a WAY in which a requested piece of instruction data is stored from the plural instruction cache WAYs, a technology for reducing power consumption is implemented by storing set prediction information relating to an accessed set of associative memory in various positions such as a branch target buffer, an instruction ache, and an operand history table, and by reducing the delay of access to the set associative cache of instructions and data. However, in order to realize the technology, a high-capacity branch target buffer for storing tag addresses and target addresses is required, the addition of such a large-capacity storage increases the power consumption, which is a problem.
Patent Document 3 that describes a conventional art of such WAY prediction of instruction cache discloses a technology for processing non-sequential instructions in which additional cache WAY prediction memory is provided to facilitate caching of non-sequential instructions. However, the technology in Patent Document 3 has a problem such that only the latest WAY hit/miss result is held for cache WAY prediction, and high accuracy of the branch prediction and cache WAY prediction cannot be expected.