1) Field of the Invention
The present invention relates to an arithmetic processor that has improved performance against cache miss.
2) Description of the Related Art
Generally, a central processing unit (CPU) is provided with a cache memory in which instruction or data from the main memory are stored. The cache memory operates at a higher speed than that of the main memory, which is generally provided externally. As a result of provision of the cache memory, the processing speed of the CPU is increased. When a cache hit occurs, the instruction or the like is read from the cache memory and the program is executed without an interruption. When a cache miss occurs, however, the instruction or the like is read from the main memory so that the execution of the program is interrupted until the data is completely read from the main memory.
FIG. 1 explains the concept of the cache memory. A CPU 11 is provided with an instruction execution unit 12 and a cache unit 13. The cache unit 13 is connected to an external (with respect to the CPU 11) memory 14. When data is to be read, the CPU 11 provides a data read request 15 and a read address 16 to the cache unit 13. If requested data is available in the cache unit 13 (i.e., when the cache hit occurs), the cache unit 13 supplies the requested data 17 to the instruction execution unit 12.
If the requested data is not available in the cache unit (i.e., when a cache miss occurs), the cache unit 13 outputs a data read request 18 to the external memory 14 and reads the requested data 19 from the external memory 14. The data 19 read from the external memory 14 is stored in the cache memory and also fed to the instruction execution unit 12.
FIG. 2 is the concrete configuration of a cache memory in the conventional arithmetic processor. A processor core 3 includes an instruction execution unit 31 and a cache unit 35. The instruction execution unit 31 includes an address generator 32 and an execution section 33. The cache unit 35 includes a cache unit control circuit 36, a tag random access memory (RAM) 37, a cache RAM 38, and a bypass data selector 39. A main memory is connected to the cache unit 35 through a bus. Normally, the capacity (assumed to be “L-bytes”) of one line of the cache RAM 38 is larger than the bus width (assumed to be “B-bytes”) between the main memory 2 and the cache unit 35.
FIG. 3 is a time chart that shows operation timings if a cache miss occurs in the cache memory in the conventional arithmetic processor. For the convenience of explanation, it is assumed herein that operations A, B, C and D have addresses in the same cache line and that the addresses are continuous in the order of parenthesized alphabets shown in FIG. 3. It is also assumed that a cache miss occurs when data on the operation A is read. In a cycle T1, the address generator 32 feeds a data read request and a read target address for the operation A, to the cache unit control circuit 36 (see an arrow 41 in FIG. 2).
In a cycle T2, the cache unit control circuit 36 refers to an address tag in the tag RAM 37 so as to determine whether the data on the operation A exists in the cache RAM 38 (see an arrow 42 in FIG. 2). As a result, in the cycle T2, the cache unit control circuit 36 obtains a determination result that a cache miss occurs for the operation A (see an arrow 43 in FIG. 2). In the cycle T2, the address generator 32 feeds a data read request and a read target address for the operation B, to the cache unit control circuit 36 (see an arrow 41 in FIG. 2).
In a cycle T3, the cache unit control circuit 36 feeds the data read request and the read target address for the operation A, to the main memory 2 (see an arrow 44 in FIG. 2). In addition, in the cycle T3, the operation B that has been accepted waits for the completion of the operation A in the cache unit 35. In the cycle T3, the address generator 32 feeds a data read request and a read target address for the operation C, to the cache unit control circuit 36 (see an arrow 41 in FIG. 2) In a cycle T4, the operation C that has been accepted is stopped for the completion of the operation A in the cache unit 35. Thereafter, the operation D which follows the operation C is not accepted.
In each of cycles T6, T8, T10 and T12, the data corresponding to the address requested to the main memory 2 is divided L/B times (where L is the capacity of one line of the cache RAM 38 and B is the bus width between the main memory 2 and the cache unit 35) and read from the main memory 2 (see an arrow 45 in FIG. 2). At this moment, the operations A, B, C and D that correspond to the one-line capacity of the cache RAM 38 are stored in the cache RAM 38.
In a cycle T6 in which the data corresponding to the operation A is read, an effective signal indicating that the read data is effective is transmitted from the main memory 2 to the cache unit control circuit 36 (see an arrow 46 in FIG. 2). The cache unit control circuit 36 feeds a control signal for selecting a bypass side, to the bypass data selector 39 (see an arrow 47 in FIG. 2). As a result, the data on the operation A read from the main memory 2 is also, directly fed to the execution section 33 (see an arrow 40 in FIG. 2). At the same time, the cache unit controller 36 transmits an effective signal indicating that the read data is effective, to the execution section 33 (see an arrow 49 in FIG. 2).
In each of the cycle T8 in which the data corresponding to the operation B is read, the cycle T10 in which the data corresponding to the operation C is read and the cycle T12 in which the data corresponding to the operation D is read, an effective signal indicating that the read data is effective is transmitted from the main memory 2 to the cache unit control circuit 36 (see an arrow 46 in FIG. 2).
In the cycle T12, if the data is completely stored in the cache RAM 38, the processing for the operation A is considered to be completed. The operation B that waits in the cache unit 35 is re-executed in the next cycle T13. In a cycle T14, the cache unit control circuit 36 refers to the address tag in the tag RAM 37 so as to determine whether the data on the operation B exists in the cache RAM 38 (see an arrow 42 in FIG. 2).
Following the reading of the data on the operation A from the main memory 2, the data on the operation B is stored in the cache RAM 38. Therefore, in the cycle T14, the cache unit control circuit 36 obtains a determination result that a cache hit occurs for the operation B (see an arrow 43 in FIG. 2). Further, in the cycle T14, the operation C is re-executed.
In a cycle T15, the cache unit control circuit 36 transmits a control signal for selecting the cache side, to the bypass data selector 39 (see an arrow 47 in FIG. 2). As a result, the data on the operation B is read from the cache RAM 38 (see an arrow 48 in FIG. 2) and fed to the execution section 33 (see an arrow 40 in FIG. 2). At the same time, the cache unit controller 36 transmits an effective signal indicating that the cache read data is effective (see an arrow 49 in FIG. 2). In a cycle T15, the cache unit control circuit 36 refers to the address tag in the tag RAM 37 so as to determine whether the data on the operation C exists in the cache RAM 38 (see an arrow 42 in FIG. 2).
The data on the operation C is already stored in the cache RAM 38 similarly to the data on the operation B. In the cycle 15, therefore, the cache unit control circuit 36 obtains a determination result that a cache hit occurs for the operation C (see an arrow 43 in FIG. 2). In addition, since the processing in the cache unit 35 is completed in the cycle T15, the operation D which follows the operation C is accepted.
In a cycle T16, the cache unit control circuit 36 transmits a control signal for selecting the cache side, to the bypass data selector 39 (see an arrow 47 in FIG. 2). As a result, the data on the operation C is read from the cache RAM 38 (see an arrow 48 in FIG. 2) and fed to the execution section 33 (see an arrow 40 in FIG. 2). At the same time, an effective signal indicating that the cache read data is effective is transmitted to the execution section 33 (see an arrow 49 in FIG. 2). Further, in the cycle T16, the address tag is refereed for the operation D (see an arrow 42 in FIG. 2). As a result, a cache hit occurs for the operation D (see an arrow 43 in FIG. 2).
In a cycle T17, the cache unit control circuit 36 transmits a control signal for selecting the cache side, to the bypass data selector 39 (see an arrow 47 in FIG. 2). As a result, the data on the operation D is read from the cache RAM 38 (see an arrow 48 in FIG. 2) and fed to the execution section (see an arrow 40 in FIG. 2). At the same time, an effective signal indicating that the cache read data is effective, is transmitted to the execution section 33 (see an arrow 49 in FIG. 2).
However, the conventional cache memory has a prominent disadvantage when a cache miss occurs. When a cache miss occurs, for example, for the operation A in the above example, the operations B to D which follow the operation A have to be stopped until the operation A is completed.
Therefore, the read data on the operations B to D are not returned to the execution section 33 until data by as much as one cache line including the data on the operation A, i.e., all the data on the operations A to D is read from the main memory 2, the operation A is re-executed and a cache hit/cache miss determination is made for the operation A. In the example shown in FIG. 3, the operation B is re-executed in the cycle T13, which indicates that quite a heavy penalty is imposed.