1. Field of the Invention
The present invention relates generally to a cache memory control apparatus for controlling hierarchical cache memories disposed between a main storage unit and a processing unit, which executes various processes using data stored in the main storage unit, and a computer system equipped with the cache memory control apparatus. More particularly, the invention is directed to a cache memory control apparatus for use in a computer system having a prefetch function of fetching speculative data, which is inclined to become necessary in the processing unit, from the main storage unit into the hierarchical cache memories in advance, and also directed to a computer system equipped with the cache memory control apparatus.
2. Description of the Related Art
A conventional computer system (data processing machine), as shown in FIG. 5 of the accompanying drawings, generally comprises a main storage unit (hereinafter called MSU) 12 storing programs and various data to be processed by the programs, and a central processing unit (hereinafter called CPU) 11 for executing various processes using the data stored in the MSU 12.
Recently, with increasing improvement of throughput of the CPU 11 and increasing enlargement of capacity of the MSU 12 as well, the data processing speed in the CPU 11 has been much faster as compared to the speed of access to the MSU 12. Assuming that the CPU 11 and the MSU 12 are combined and are respectively regarded as the data consumption side and the data supply side, shortage of supply of data tends to occur so that the CPU 11 would spend most of the processing time waiting for data from the MSU 12, lowering the effective throughput of the CPU 11 even though its processing speed is increased.
As a solution, it has been customary to minimize the apparent access time of the MSU 12, as viewed from the CPU 11, by placing a cache memory, which is smaller in capacity and higher in processing speed than the MSU 12, either inside or outside an operationally near the CPU 11 and by using the cache memory to adjust the access delay of the MSU 12 with respect to the cycle time of the CPU 11.
This cache memory usually assumes only a single level or class (of the hierarchy), in the form of a block of plural words, between the MSU 12 and a register 11b in the CPU 12. Alternatively, however, if the difference between the access time of the MSU 12 and the cycle time of the CPU 11 is considerably large, one or more additional levels or classes (blocks of plural words) are placed between the MSU 12 and the register 11b in the CPU 11. In the example shown in FIG. 5, a primary cache memory 13 and a secondary cache memory 14 are placed between the MSU 12 and the register 11b, which is coupled to an arithmetic unit 11a, in the CPU 11 to form a two-level or a two-class cache memory hierarchy. Both the primary and secondary cache memories 13, 14 are disposed inside the CPU 11.
Specifically the primary cache memory 13 is disposed hierarchically near the arithmetic unit 11awhile the secondary memory 14 is disposed hierarchically near the MSU 12. Generally the secondary cache memory 14 is set to be larger in storage capacity than the primary cache memory 13; that is, in a multi-cache-memory hierarchy, the nearer a cache memory is disposed with respect to the arithmetic unit 11a, the smaller its storage capacity should be set.
In the computer system equipped with the foregoing cache memories 13, 14 with the two-level or two-class hierarchy, if the CPU 11 needs a certain kind of data D, first the CPU 11 discriminates whether the data D is stored in the primary cache memory 13. If the same data D is stored in the primary cache memory 13 (if a xe2x80x9ccache hitxe2x80x9d results with respect to the primary cache memory 13), the CPU 11 reads the data D from the primary cache memory 13 without having to access either the secondary cache memory 14 or the MSU 12.
On the contrary, if the data D is not stored in the primary cache memory 13 (if a xe2x80x9ccache missxe2x80x9d results with respect to the primary cache memory 13), the CPU 11 discriminates whether the data D is stored in the secondary cache memory 14. As a result, if a cache hit then results with respect to the secondary cache memory 14 (if information retrieval has taken place successfully with respect to the secondary cache memory 14), the CPU 11 reads a data block containing the data D from the secondary cache memory 14 and then writes the data block into the primary cache memory 13, whereupon the CPU 11 reads the data D from the primary cache memory 13.
Further, if the data D is not stored even in the secondary cache memory 14 (if a cache miss results with respect to the secondary cache memory 14), the CPU 11 reads a data block containing the data D from the MSU 12 and writes the data block into the primary and secondary cache memories 13, 14, whereupon the CPU 11 reads the data D from the primary cache memory 13.
As mentioned above, if a cache miss has resulted with respect to the primary cache memory 13 or the secondary cache memory 14, the data D must be read from the secondary cache memory 14 or the MSU 12, respectively, which would take more time to read the data D. In the meantime, although recent computer systems have sharply increased the clock frequency of the CPU 11, the performance of MSU 12, such as in the form of DRAM (dynamic random access memory), has not kept up with the improvement in the increased throughput of the CPU 11. As a result, the MSU 12 would be located far from CPU 11 since as previously mentioned that the difference between the access time of the MSU 12 and the cycle time of the CPU 11 is considerably large the throughput of the CPU 11 would increasingly be impaired due to the foregoing unsuccessful accessing of the cache memories 13 ending.
In order to avoid the penalty for unsuccessful access of the cache memories 13 and 14, it has been a common practice to fetch necessary data from the MSU 12 into the cache memories 13, 14 prior to the arithmetic processing.
For this purpose, the CPU 11 issues, in addition to a loading command to fetch data from the MSU 12 into the register 11b, a dedicated-to-loading command to fetch the data from the MSU 12 into the primary and secondary cache memories 13, 14, but not into the register 11b, whereupon the CPU 11 can execute a separate process (the arithmetic process in the illustrated example) without managing or monitoring the state of execution of the dedicated-to-loading command, thus leaving a process or processes associated with the dedicated-to-loading command to the primary and secondary cache memories 13, 14. This dedicated-to-loading command is also called a xe2x80x9cprefetchxe2x80x9d command because of its function.
Now assuming that the arithmetic unit 11a performs consecutive arithmetic processes as data of the first to N-th items are substituted for concerned items of a predetermined equation one item in each arithmetic process, the CPU 11 issues a prefetch command to fetch (i+k)th item data to the primary cache memory 13, prior to execution of the arithmetic process with respect to i-th item data, thereby resulting in the arithmetic unit 11a executing the respective arithmetic process without a cache miss.
As a result, the (i+k)th item data, which is inclined to become necessary in a forthcoming arithmetic process succeeding the arithmetic process of the i-th item data (by the arithmetic unit 11a) by k steps, is fetched into the primary and secondary cache memories 13, 14 in parallel to the arithmetic process of the i-th item data. Therefore, by the time the CPU 11 should fetch the (i+k)th item data from the primary cache memory 13 into the register 11b for the arithmetic process coming k steps later, the (i+k)th item data will have existed in the primary cache memory 13 so that a cache miss can be prevented, thus avoiding any penalty for the cache miss.
However, the following problems have been encountered with the conventional technology if such a prefetch command is issued repeatedly in order to surely avoid penalties for possible cache misses:
(1) If the throughput (frequency of occurrence) of prefetch commands rises, a particular prefetch command for the data of a certain item would be issued much earlier than necessary, so that the data of the certain item would be fetched into the primary cache memory 13 too early. Because the primary cache memory 13 has usually only a limited storage capacity, the existing data could be ejected from the primary cache memory 13 as additional data is stored into the primary cache memory 13 from the MSU 12 in response to the execution of another prefetch command issued later. In that event, the prefetched data does not exist in the primary cache memory 13 when it actually becomes necessary for the forthcoming arithmetic process to be performed by the arithmetic unit 11a, which would result in cache miss with respect to the primary cache memory 13. Consequently that data must be prefetched again from the secondary cache memory 14 or the MSU 12 into the primary cache memory 13, which would in turn reduce the throughput of the CPU 11.
(2) The primary cache memory 13 must be the one which the arithmetic unit 11a can have high-speed access. For this purpose, not only the storage capacity of the primary cache memory 13 but also the number of ports, at which simultaneous accessing is allowable, in the primary cache memory 13 are restricted so that, if the throughput (frequency of issue) of prefetch commands could be increased, the storing of data from the primary cache memory 13 into the register 11b and from the secondary cache memory 14 into the primary cache memory 13 would be delayed due to the execution of the prefetch commands. In other words, in the main pipeline of the CPU 11, the access to the primary cache memory 13 which access will be inclined to become necessary in the ordinary process would collide with the access to the primary cache memory 13 for execution of the prefetch command, lowering the throughput of the CPU 11.
With the foregoing problems in view, it is an object of the present invention to provide a cache memory control apparatus and a computer system which prevent ejection of necessary data from a cache memory confliction in the main pipeline of a processing unit, guaranteeing high-speed processing of the computer system even if the prefetch commands are issued at high frequency.
In order to accomplish the above-mentioned object, according to a first aspect of the present invention, there is provided a cache memory control apparatus for controlling a plurality of hierarchically arranged cache memories into which data of high-frequency of access by a processing unit, which executes various processes using data stored in a main storage unit, the apparatus comprising: a command control section for issuing a prefetch command instruction to fetch speculative data, which is inclined to become necessary for near future use in the processing unit, from the main storage unit into the cache memories, prior to execution of the individual process by the processing unit and a prefetch control section for controlling the hierarchically arranged cache memories, when the prefetch command issued by the command control section is executed, in such a manner that at least one of the hierarchically arranged cache memories, to which the speculative data which is inclined to become necessary for near future use in the processing unit is to be fetched, which is inclined to be is changeably selected as one or more destination cache memories.
According to a second aspect of the present invention, a computer system comprises: a main storage unit, a processing unit for executing various processes using data stored in the main storage unit, a plurality of hierarchically arranged cache memories to which data of high-frequency of access by the processing unit is to be fetched from the main storage unit and a cache memory control apparatus for controlling the plurality of hierarchically arranged cache memories. The cache memory control apparatus includes a command control section for issuing a prefetch command instruction to fetch speculative data, which is inclined to become necessary for near future use in the processing unit, from the main storage unit into the cache memories prior to execution of the individual process by the processing unit, and a prefetch control section for controlling the cache memories, when the prefetch command issued by the command control section is executed, in such a manner that at least one of the cache memories to which the speculative data is to be fetched is changeably selected as one or more destination cache memories.
Preferably, the cache memory control apparatus also comprises a status information detecting means for detecting status information about one of the cache memories disposed operationally near the processing unit and outputting the detected status information to the prefetch control section so that the prefetch control section controls the cache memories so as to change over the one or more destination cache memories in accordance with the detected status information.
The command control section of the cache memory control apparatus preferably includes; a prefetch kind setting section for setting kind-of-prefetch information about the kind of prefetch as a prefetch-destination change over control condition (i.e., the change-over control condition for prefetching data to the primary and/or secondary caches) for the prefetch command to be issued by the command control section; and a prefetch kind identifying section for identifying the kind of prefetch set for the prefetch command and outputting the result of the identification of the kind of prefetch to the prefetch control section so that the prefetch control section controls the changeover of the destination cache memory based on the kind of prefetch received from the prefetch kind identifying section.
In the above-mentioned cache memory control apparatus and computer system of the present invention, the prefetch control section changeably selects one or more destination cache memories among the plural hierarchical cache memories in accordance with the status information about one hierarchical cache memory hierarchically near the processing unit (state-of-use information about the storage area or state-of-contention information about the ports), when a prefetch command is executed.
Namely, when a prefetch command is executed, the data prefetched from the main storage unit is not always stored in all of the hierarchical cache memories but is copied into only appropriate cache memories in accordance with the status information about one hierarchical cache memory hierarchically near the processing unit.
Accordingly, it is possible to restrict access (prefetch) to the cache memory hierarchically near the processing unit (the primary cache memory) in accordance with the status of the primary cache memory. And particularly if the prefetch commands are issued at high frequency, it is possible to avoid ejecting or replacing (sweeping) necessary data from the cache memory and incurring a conflict in the main pipeline of the processing unit, realizing high-speed processing in the computer system.