1. Field of the Invention
The present invention relates to a multiport cache memory control unit in which the throughput of arithmetic processing is not lowered.
2. Description of the Background Art
FIG. 1 is a block diagram of a conventional multi processor system.
As shown in FIG. 1, a plurality of central processing units 1 (CPUs 1) which each are arranged in a conventional control unit are respectively operated in a conventional multi processor system. In addition, pieces of data are transmitted between a main memory unit 3 and each of the conventional control units through a system bus 2. Furthermore, communication is executed among the conventional control units through the system bus 2.
The main memory unit 3 has large capacity to store a large pieces of data so that the main memory unit 3 is formed of devices such as a dynamic random access memory (DRAM). However, read/write operations are slowly executed in the DRAM. Therefore, a conventional control unit provided with a cache memory, in which some of the pieces of data stored in the main memory unit 3 are stored, has been recently adopted to execute arithmetic processing in the conventional control unit. As a result, the arithmetic processing is executed at a high speed.
FIG. 2 is a block diagram of a conventional control unit having a cache memory.
As shown in FIG. 2, a conventional control unit is provided with the CPU 1 in which many types of arithmetic processing are executed, a cache memory 6 for storing some of the data stored in the main memory unit 3 at addresses, and a tag memory 7 for storing addresses of which the numbers agree with the numbers of the addresses of the cache memory 6 to function as an address book of the data stored in the cache memory 6. The conventional control unit is further provided with a selector 9 for selecting either an address signal provided from the CPU 1 or an address signal transmitted through the system bus 2, and a control circuit 10 for controlling operations executed in the conventional control unit.
In the above configuration, operations executed in the conventional control unit is explained.
When a piece of data DA1 stored at an address AD1 is required to execute arithmetic processing in the CPU 1, an address signal AD is transmitted from the CPU 1 to the tag memory 7 through the selector 9 to check whether or not the address AD1 is stored in the tag memory 7. In cases where the address AD1 is stored in the tag memory 7, an address hit occurs. The address hit means that the data DA1 is stored at the address AD1 of the cache memory 6. Thereafter, a data signal is transmitted from the CPU 1 to the cache memory through a process bus 4 and a processor bus interface 5 so that the data DA1 is read out from the cache memory 6 to the CPU 1. In addition, in cases where the data DA1 is changed with a piece of new data DA2 in the CPU 1, the new data DA2 is stored at the address AD1 of The cache memory 6 in place of the data DA1, if necessary. The above operation is executed under control of the control circuit 10.
On the other hand, in cases where the address AD1 is not stored in the tag memory 7 when the address signal AD is provided to the tag memory 7, a cache miss occurs. The cache miss means that any piece of data is not stored at the address AD1 of the cache memory 6. In this case, the selection in the selector 9 is changed from the address signal AD provided from the CPU 1 to an address signal AD transmitted through the system bus 2 to fetch the data DA1 stored at the address AD1 under control of the control unit 10. In other words, a traffic operation is executed. Thereafter, the data DA1 stored at the address AD1 of the main memory unit 3 is fetched into the cache memory 6 through the system bus 2 and a system bus interface 8. Also, the address AD1 is stored in the tag memory 7 by the CPU 1. Thereafter, the data DA1 fetched into the cache memory 6 is read out from the cache memory 6 to the CPU 1. In addition, in cases where the data DA1 is changed with a piece of new data DA3 in the CPU 1, the new data DA3 is stored at the address AD1 of the cache memory 6 in place of the data DA1 by the CPU 1, if necessary. Also, the new data DA3 is stored at the address AD1 of the main memory unit 3 by the CPU 1 according to a protocol, if necessary.
However, as shown in FIG. 1, because one conventional control control unit shown in FIG. 2 is connected with the other conventional control units through the system bus 2, the data DA1 is still stored in the cache memories 6 of the other conventional control units even though the new data DA3 is stored at the address AD1 of the main memory unit 3 by the CPU 1 of one conventional control unit shown in FIG. 2 (first case). Therefore, though the data DA1 stored into the cache memories 6 of the other conventional control units is stale, the arithmetic processing is executed by utilizing the data DA1 in the other conventional control units. This means that the consistency of the data is not maintained because the data DA3 differs from the data DA1.
Also, even through the new data DA3 is stored at the address AD1 of the cache memory 6 in place of the data DA1 by the CPU 1 in one conventional control unit shown in FIG. 2, in cases where the data DA1 stored at the address AD1 of the main memory unit 3 is not rewritten to the new data DA3 according to a protocol (second case), the data DA1 is fetched from the main memory unit 3 into the cache memories 6 of the other conventional control units. Thereafter, the arithmetic processing is executed by utilizing the data DA1 in the other conventional control units though the data DA1 fetched into the cache memories 6 is stale. This means that the consistency of the data is not maintained.
Therefore, a snoop operation is executed by each of the conventional control units to watch both the traffic operation executed by the conventional control units and the rewrite of the data in the main memory unit 3 so that the consistency of the data is maintained.
For example, in cases where the data DA1 stored at the address AD1 of the main memory unit 3 is rewritten to the new data DA3 after the data DA1 stored in the cache memory 6 is rewritten to the new data DA3 in one conventional control unit (the first case), the rewrite of the data in the main memory unit 3 is promptly detected by the snoop operation executed by the other conventional control units. Thereafter, in cases where the address AD1 is stored in the tag memories 7 of the other conventional control units, the new data DA3 stored in the main memory unit 3 is fetched at the address AD1 of the cache memories 6 of the other conventional control units.
Also, though the new data DA3 is stored at the address AD1 of the cache memory 6 in place of the data DA1 in one conventional control unit, in cases where the data DA1 stored at the address AD1 of the main memory unit 3 is not rewritten to the new data DA3 (the second case), the snoop operation is always executed by the one conventional control unit to watch the traffic operation executed by the other conventional control units. In cases where the traffic operation is executed by one of the other conventional control units to fetch the data DA1 stored at the address AD1 of the main memory unit 3, the other conventional control unit detected the traffic operation is interrupted to halt operations executed in the other conventional control unit. After the operations executed in the other conventional control unit is halted, the data DA1 stored at the address AD1 of the main memory unit 3 is rewritten to the new data DA3 by the one conventional control unit. Thereafter, the operations in the other conventional control unit detected the traffic operation is resumed. Therefore, the new data DA3 stored at the address AD1 of the main memory unit 3 is fetched into the other conventional control unit.
Accordingly, drawbacks resulting from the first and second cases can be resolved.
However, when the traffic operation is executed by one conventional control unit in the conventional multi processor system, because the traffic operation is detected by each of the other conventional control units in which the snoop operation is always executed, the selection in the selector 9 of each of the other conventional control units is necessarily changed from the address signal AD provided from the CPU 1 to the address signal AD transmitted through the system bus 2. Therefore, because the address signal AD generated in the CPU 1 cannot be transmitted to the tag memory 7, the arithmetic processing executed in each of the other conventional control units is necessarily halted even though the address AD1 is not stored in the tag memory 6 of the other conventional control units. In this case, it is not necessary to halt the arithmetic processing in some of the other conventional control units to maintain the consistency of the data in cases where the address AD1 is not stored in the tag memory 6 in some of the other conventional control units.
Accordingly, operating time is lost in each of the conventional control unit so that the throughput of the arithmetic processing is unnecessarily lowered in the conventional multi processor system.
Also, in cases where the CPU 1 is formed of a restricted instruction set computer (RISC) type in which one instruction is executed in a clock cycle, a single data bus arranged in the processor bus 4 is enough to execute the arithmetic processing in the CPU 1. However, in cases where the CPU 1 has a plurality of arithmetic units in which many instructions are executed in a clock cycle at the same time, another drawback is generated in the conventional multi processor system.
That is, when a plurality of read requests are provided from a plurality of arithmetic units accommodated in the CPU 1 to the cache memory 6 at the same time, an arbitration operation is executed under control of the control circuit 10 so that the plurality of the read requests are processed one after another. In this case, one arithmetic unit selected by the control circuit 10 can access to the cache memory 6 while the other arithmetic units are left in a waiting condition. This means that the arithmetic processing in the other arithmetic units is temporarily halted. Therefore, the throughput of the arithmetic processing deteriorates in the conventional multi processor system.