1. Field of the Invention
The present invention generally relates to a data processor, and specifically relates to a data possessor such as a microprocessor and an image processor that include an instruction cache.
2. Description of the Related Art
Conventionally, various processors take in an instruction from an external memory (RAM), and execute the instruction by an execution unit.
FIG. 1 is a block diagram showing this kind of microprocessors. A microprocessor 10 has an execution unit 11. The execution unit 11 executes an instruction stored in an external RAM 12, which functions as an external memory, by the following procedure. First, the execution unit 11 outputs an instruction address to the external RAM 12 (step 1), and receives a corresponding instruction (step 2). Then, the execution unit 11 analyzes and executes the instruction (step 3). In that event, the execution unit 11 outputs the data address to the external RAM 12 (step 4) in order to read and write data, and reads and writes the data (step 5). Here, the operation in the steps 4 and 5 may be omitted depending on instructions.
With the configuration of FIG. 1, it is necessary to access the external RAM 12 every time an instruction is executed, causing a problem that the execution of the instruction takes time.
In order to solve this problem, practices have been to provide an instruction cache 13 in a microprocessor 10A as shown in FIG. 2. When the instruction cache 13 does not contain an instruction required, the instruction is read from the external RAM 12 according to the procedure of steps 1 and 2 and supplied to the execution unit 11, and the instruction is stored in the instruction cache 13. When the execution unit 11 requires the same instruction afterwards, the corresponding instruction is read from the instruction cache 13 which received the instruction address, and the instruction is supplied to the execution unit 11. Since the time to access the instruction cache 13 is generally shorter than time to access the external RAM 12, time until an instruction is read and executed can be shortened.
FIG. 3 is a block diagram showing configuration of the instruction cache 13 shown in FIG. 2. The instruction cache 13 has an instruction address register 14, two units of tag RAM 15 and 16, two units of cache RAM 17 and 18, 2 comparators 19 and 20, a hit/miss checking logic circuit 21, and a selector 22. The tag RAM 15 and the cache RAM 17 are interlocking (system #0), and the tag RAM 16 and the cache RAM 18 are interlocking (system #1).
The instruction cache 13 receives an instruction address from the execution unit 11 shown in FIG. 2, and outputs a corresponding instruction through the selector 22. The instruction address is sent to the external RAM 12, and a corresponding block is received from the external RAM. A block is a group of a plurality of instructions specified by continuous addresses.
FIG. 4 shows instructions that are executed sequentially. In FIG. 4, the instructions are specified by continuous instruction addresses except for the branch instruction (branch). The instructions are executed in the order shown by the arrow on the right-hand side of FIG. 4. The four instructions, for example, specified by the continuous addresses are considered as a block.
The instruction address register 14 of FIG. 3 is divided into areas of a block offset, a line address, and a tag address. Two cache RAMs 17 and 18 are accessed by the line address and the block address, and output a specified instruction. The line address is used in order to limit an area in the cache RAMs 17 and 18 wherein instructions from the external RAM 12 are to be stored. For example, an instruction stored in the addresses xxxx and yyyy of the external RAM 12 is stored in zzz of the cache RAM 17 or 18. If the instruction is allowed to be stored in an arbitrary storage area of the cache RAM 17 or 18, accessing the cache RAM 17 and 18 will take time.
Here, the instruction read from the external RAM 12 can be stored in the two cache RAMs 17 and 18. In this case, it is said that the degree of association is 2. The cache RAMs 17 and 18 may be configured by discrete memory chips, or by splitting a storage area of one memory chip.
The block offset specifies an instruction within a block from a line address. For example, an xe2x80x9caddxe2x80x9d instruction to add in the first line of FIG. 4 is specified by the line address, and the instructions of xe2x80x9caddxe2x80x9d, xe2x80x9csubccxe2x80x9d, xe2x80x9corxe2x80x9d, and xe2x80x9csetxe2x80x9d are specified by changing the block offset from xe2x80x9c00xe2x80x9d to xe2x80x9c01xe2x80x9d, xe2x80x9c10xe2x80x9d, and xe2x80x9c11.xe2x80x9d
The tag RAMs 15 and 16 output a tag address in accordance with the line address. Comparators 19 and 20 compare the tag addresses read from the tag RAMs 15 and 16, respectively, with the tag address read from the instruction address register 14 to determine whether they match. When an instruction specified by the line address is stored in the cache RAM 17, the comparison result of the comparator 19 is a match (cache hit). To the contrary, when the instruction specified in the line address is stored in the cache RAM 18, the comparison result of the comparator 20 is a match (cache hit).
The hit/miss checking logic circuit 21 controls the selector 22 according to an output of the comparators 19 and 20. If the comparator 19 outputs a match signal, the selector 22 will select the cache RAM 17, and if the comparator 20 outputs a match signal, the selector 22 will select the cache RAM 18. The selected instruction is supplied to the execution unit 11.
FIG. 5 shows the above-described process where the tag address read from the tag RAM 15 and the tag address read from the instruction address register 14 are in agreement. In the drawing, thick lines indicate flows of the address, the instruction, and a signal and the like used in the read-out operation.
FIG. 6 shows a case where comparison results of both comparators 19 and 20 were negative (cache miss). In the drawing, thick lines indicate flows of the address, the instruction, and the signal used in write-in operation. In this case, the instruction is read from the external RAM 12 and is written into the cache RAM 17 or the cache RAM 18. FIG. 6 shows an example in which the instruction read is written into the cache RAM 17. Further, the tag address of the instruction address that was missed is written in the tag RAM 15 that corresponds to the cache RAM 17. Further, the instruction stored in the cache RAM 17 is read, and supplied to the execution unit 11 through the selector 22.
However, there is a problem in the conventional instruction cache described above.
FIG. 7 shows a sequence of instruction reading from the instruction cache 13 configured as shown in FIG. 3. In order to clearly illustrate flows of an address and the like, some of the reference numbers given to the components shown in FIG. 3 are omitted. In FIG. 7, one instruction is made of 4 bytes and 1 block is made of four instructions (that is, 1 block includes 16 bytes). Moreover, the number of lines is 128. The read-out sequence starts at a step (a) and ends with a step (e).
Suppose that an instruction address of xe2x80x9c0xc3x9700000000xe2x80x9d is supplied from the execution unit 11, and stored into the instruction address register 14. In this case, the line address is xe2x80x9c0000000xe2x80x9d and the block offset is xe2x80x9c00.xe2x80x9d At the step (a), it is assumed that the tag address of the instruction address is the same as the tag address read from the tag RAM 15. Therefore, the hit/miss checking logic circuit 21 selects the cache RAM 17 by controlling the selector 22. For example, the addition instruction xe2x80x9caddxe2x80x9d of FIG. 4 is read from the cache RAM 17.
Next, the instruction address xe2x80x9c0xc3x9700000004xe2x80x9d is stored in the instruction address register 14 in the step (b). In this case, the block offset is incremented by one from xe2x80x9c00xe2x80x9d, and it is set to xe2x80x9c01xe2x80x9d. Since the line address does not change, the cache RAM 17 stays selected and an instruction corresponding to the block offset xe2x80x9c01xe2x80x9d is chosen (subtraction instruction xe2x80x9csubccxe2x80x9d in FIG. 4).
Further, the block offset is set to xe2x80x9c10xe2x80x9d and xe2x80x9c11xe2x80x9d in accordance with the instruction address xe2x80x9c0xc3x970000008xe2x80x9d and xe2x80x9c0xc3x970000000cxe2x80x9d, respectively, and an xe2x80x9corxe2x80x9d instruction xe2x80x9cORxe2x80x9d and a set instruction xe2x80x9cSETxe2x80x9d are respectively read from the cache RAM 17 (steps (c) and (d)). The line address does not change in the meantime, however.
Next, when the instruction address changes to xe2x80x9c0xc3x9700000010xe2x80x9d at the step (e), the line address will be incremented by one to become xe2x80x9c0000001.xe2x80x9d At the step (e), it is assumed that the tag address of the instruction address is the same as the tag address read from the tag RAM 16. Therefore, the hit/miss checking logic circuit 21 selects the cache RAM 18 by controlling the selector 22.
Throughout the steps from (a) to (e) that are specified by the address indicative of the continuous instructions, the cache RAM 18 was performing read-out operation although it was not selected. Where one or more cache RAMs are marked with a thick lined circle in the drawing, it indicates that the cache RAM is in an enabled state (it is also called an active state). Therefore, the problem is that useless power is consumed.
It is a general object of the present invention to provide an apparatus that substantially obviates one or more of the problems caused by the limitations and disadvantages of the related art.
Features and advantages of the present invention will be set forth in the description which follows, and in part will become apparent from the description and the accompanying drawings, or may be learned by practice of the invention according to the teachings provided in the description. Objects as well as other features and advantages of the present invention will be realized and attained by an apparatus particularly pointed out in the specification in such full, clear, concise, and exact terms as to enable a person having ordinary skill in the art to practice the invention.
To achieve these and other advantages and in accordance with the purpose of the invention, as embodied and broadly described herein, the invention provides a data processing apparatus with an instruction cache, which can operate with low power consumption, solving the problem of the above-mentioned conventional technology, and avoiding the useless power consumption.
The above-mentioned objective can be attained by a data processor with a plurality of cache memory units, wherein only a cache memory which stores demanded instructions is enabled, while other cache memory units are disabled (it is also called being in an inactive state). Since a cache memory that does not store the demanded instructions is disabled, power is not consumed. That is, only the cache memory that stores the demanded instructions consumes power. Therefore, useless power consumption by other memory units is avoided and the instruction cache with low power consumption can be realized.