This invention relates generally to a data processor and more particularly a data processor having a logical cache memory which will be suitable for a multi-processor system in which a large number of arithmetic units share one main storage.
Recent computer systems are generally of the type wherein a large number of users share one computer and each user executes simultaneously a large number of processes. To comply with such a trend, a multiprocessor system in which a large number of arithmetic units share one main storage and an arithmetic unit is allotted to each process has become a common approach in place of a system in which one arithmetic unit executes a large number of processes on a time division basis.
According to the multi-processor system having such a structure, each arithmetic unit effects a memory access by use of a common bus. Therefore, bus competition will occur unless certain measures are taken to avoid the possibility that each arithmetic unit will not be allowed to exhibit fully its performance. These problems are solved by providing a local cache memory for each arithmetic unit. This cache memory is a small capacity memory and holds part of the content of the main memory and its access time is generally from about 1/5 to 1/10 of that of the main memory. If the data requested by the arithmetic unit exists inside the cache memory (this is called "cache hit"), memory access is completed within a short time and in this case, access to the common memory is not made. If the requested data does not exist inside the cache memory (this is called "cache miss"), block data of a predetermined size is transferred from the common memory to the local cache memory and the data is supplied to the requesting arithmetic unit. If such a cache memory is provided, a considerably high cache hit ratio can be obtained even from the cache memory having a small capacity due to locality of memory access and hence, almost all memory accesses are completed between the arithmetic unit and the cache memory and access to the common bus can be reduced drastically. In other words, even when a large number of processors are connected to the common bus, the frequency of bus competition is low and the effect of providing plural processors can be derived fully.
In the multiprocessor system, it is another important objective to utilize a multiple virtual storage system. As described in Japanese Patent Laid-Open No. 79446/1985, for example, the multiple virtual storage system is a system in which the logical address space starting from 0 (zero) address is allotted to each process. A greater size is allotted to the logical address space than to a physical address space representing the physical position in the main memory. Therefore, the logical address space of each process is allotted onto the main memory and in a secondary storage and only the necessary data is placed in the main memory. When the requested data does not exist in the main memory, part of the data in the main memory is swapped out (hereinafter called "swap-out") to the secondary storage and the necessary data is loaded from the secondary storage into the main memory (hereinafter called "swap-in"). According to this technique, each process can make access to the main memory and to the secondary storage as one address space.
To accomplish such a technique, the logical address space and the physical address space are managed as block data of a predetermined size which is called a "page", and swap-in and swap-out between the main memory and the secondary storage are effected in the unit of this page. Correspondence between the logical address page and the physical address page is managed by a page table. When the arithmetic unit makes a memory access, the page table is accessed using the logical address, which is converted to the physical address. In the multiple virtual storage system, such a page table is provided for each process and when the process is switched, the page table to be referred to is switched, too. In this manner, it becomes possible to allot the logical address space from the 0 address for each process. Since the page table is placed in the main memory, the overhead of the memory access is great if the page table on the main memory is referred to whenever memory access is made. Therefore, the set of the logical address page which is recently converted and the physical address page is stored in a buffer memory called "TLB" (Translation Look-aside Buffer), and when the TLB hits, address conversion can be made at a high speed without requesting access to the page table in the main memory.
The above is a summary of a multiprocessor system. Next, the use of a cache memory in such a system will be explained. Two systems are available for access to the cache memory. One is a logical cache memory in which access is made to the cache memory using the logical address before address conversion by the TLB and the other is a physical cache memory in which access is made to the cache memory using the physical address after address conversion by the TLB. In the physical cache memory, address conversion by the TLB is necessary whenever memory access is to be made, and this invites an increase in the memory access time. In the logical cache memory, on the other hand, address conversion by the TLB is not necessary so long as the cache memory hits, and the address conversion must be made only at the time of block transfer from the common memory when the cache memory miss. Accordingly, the memory access time can be shortened drastically.
In the arithmetic unit, the cache memory is often divided into areas for instructions and for data in order to improve memory through-put. When the physical cache memory is employed in such a structure, TLBs must be disposed for instructions and for data, respectively. In accordance with the logical cache memory, it is possible to share the TLB between the instruction cache and the data cache, and a reduction of hardware quantity can be achieved.
On the other hand, when the logical cache memory is employed, it will be possible to employ a structure wherein a bus that is shared commonly by a large number of processors is used as the logical address bus (hereinafter called "logical common bus") and a structure wherein it is used as a physical address bus (hereinafter called the "physical common bus"). In the logical common bus structure an address convertor is disposed on the side of the common memory and is shared by each arithmetic unit. In the physical common bus structure, on the other hand, an address convertor is provided for each arithmetic unit. The logical common bus structure has the advantage that the hardware quantity can be reduced because the TLB can be shared. When the number of processors connected to the bus increases, however, access concentrates on the TLB with the change of the page table and this might become a bottle neck in the system. In addition, main memory access from the I/O processors with swap-in and swap-out of the pages is made by using the physical address. Therefore, according to the logical common bus structure, a conversion table is necessary in order to convert the physical address from the I/O processors to the logical address.
For the reasons described above, the physical common bus structure by use of the logical cache memory is believed suitable for the memory system of a multiprocessor system.
Another problem of the cache memory is coincidence assurance of the memory content of each cache memory. In a multiprocessor system wherein a local cache memory is provided for each processor, it is of utmost importance to assure data coincidence for the various cache memories. Let's consider the case, for example, where certain data in the common memory is shared by the local cache memories of the arithmetic units A and B. If the arithmetic unit A updates the shared data but does not report updating to the cache memory of the arithmetic unit B, and if the arithemtic unit B makes access to this data, then the cache memory in its own unit hits and the wrong data is read. To solve this problem, each unit must report updating to all the other arithmetic units when it updates the data. Each arithmetic unit must have means for monitoring the common bus, to detect the report of data updating and manage coincidence assurance of each cache memory.
The protocol of cache memory coincidence assurance varies with the write access processing system of the cache memory. The write access processing system of the cache memory may be a store-through system or a store-swap system. The former is a system which updates the contents of the cache memory and of the main memory whenever write access from the arithmetic unit occurs. Therefore, the contents of the cache memory and of the main memory are always coincident. In the store-swap system, on the other hand, if the write access hits the cache memory, processing is completed only by updating the data in the cache memory. Updating of the main memory is effected for the first time when the block containing the updated data is put out from the cache memory. For this reason, the contents of the cache memory and of the main memory are not always coincident.
An example of the coincidence assurance protocol of the cache memory in each write access processing system will be given next. Since the contents are always coincident between the cache memory and the main memory in the store-through system, the writing of data from one processor to the main memory is reported to all the other processors. Each processor monitors the transaction on the common bus and when it detects a writing to the main memory, it checks whether or not the data corresponding to that address exists in the local cache memory. If the corresponding data exists, the block containing it is nullified. On the other hand, the coincidence assurance protocol in the store-swap system is more complicated than the store-through system. Though various coincidence assurance protocols have been proposed, only one example will be given herebelow.
The following three states can exist in a block of the cache memory.
(1) Shared . . . The block is coincident with the memory content of the main memory and is shared by a plurality of cache memories. PA0 (2) Exclusive . . . The block is coincident with the content of the main memory and exists in only one cache memory. PA0 (3) Owned . . . The block is not coincident with the content of the main memory and exists in only one cache memory. PA0 (i) Read hit: PA0 (ii) Read miss: PA0 (iii) Write hit: PA0 (iv) Write miss: PA0 (i) When a block transfer transaction is detected on the common bus: PA0 (ii) When a data updating report is detected on the common bus:
Next, processing when the arithmetic unit makes access to the block under each of the states described above will be explained.
The data is read out irrespective of the state of the block.
Block transfer is requested to the common memory. At this time, whether or not any block under transfer exists in other cache memories is inspected and if it does, it is written as a Shared block into its own cache memory and if it does not, it is written as an Exclusive block. Thereafter, read access that has been suspended is started again.
In the write operation into the Shared block, writing is made to the corresponding cache memory and at the same time, data updating is reported to the other cache memories through the common bus. Thereafter, the state of the block becomes "Owned". In the write operation to the Exclusive and Owned blocks, it is not necessary to report updating to the other cache memories.
Block transfer is requested to the common memory. At this time, whether or not any block under transfer exists in the other cache memories is inspected and if it exists, it is written as the Shared block and if it does not, as the Exclusive block, into its own cache memory. Thereafter, the write access that has been suspended is started again.
In the coincidence assurance protocol of the store-swap system described above, each arithmetic unit has a monitor for checking the transaction on the common bus and executes the following processing.
If the corresponding block exists in its own cache memory and if the state of the block is "Owned", the block transfer request on the common bus is aborted and the Owned block in its own cache is written back to the common memory so as to change it to "Exclusive". Thereafter, the block transfer request that has been suspended is started again. If the state of the corresponding block is either Shared or Exclusive, existence of that block inside its own cache memory is reported through the common bus. Thereafter, the state of the block becomes Shared.
If the corresponding block exists in its own cache memory, that block is nullified.
In accordance with the following protocol, coincidence of cache memories when the store-swap system is employed can be assured. This cache memory coincidence assurance is necessary not only for the multiprocessor system, but also for the uniprocessor system. In other words, if the arithmetic units and the I/O processors are connected onto the physical bus, coincidence assurance is necessary for the cache memories that the arithmetic units have.
The description given above states that the structure suitable for a multiprocessor system is the logical cache memory and the physical common bus and coincidence assurance of the dispersed cache memories is also described. However, the prior art technique described above involves several other problems.
First of all, the problem that occurs in the case of a single processor will be described. In a computer system employing a multiple virtual storage system, the logical address space starting from the zero (0) address is allotted to each process and for this reason, it is sometimes necessary for the same logical address to refer to different physical data. In the conventional logical cache systems, therefore, the process of purging all the logical cache contents whenever processes are switched is executed in order to prevent the process under execution from making access to the data of the same logical address of the other processes. If a task switch occurs frequently, there occurs the problem that the effect of advantage of the cache memory drops. When the cache memory of the store-swap system is employed, it is necessary to copy back the block containing the updated data to the main memory before the purge processing of the cache memory.
This problem can be solved by providing a process identifier in the comparison address of the logical cache memory as described in Japanese Patent Laid-Open No. 79446/1985. In other words, when the logical cache memories are accessed, comparison is made on the basis of not only the logical address but also the process identifier. This method eliminates the necessity of purging the content of the cache memory for each task switch.
When this system is employed, however, another new problem develops. Namely, the problem of "Address Synonym" occurs. This problem can occur when each process shares its data. When the processes share data between them, different logical address pages are mapped to the common physical address under the management of the page table of each process. Accordingly, the same data can exist at different entry positions in the cache memory accessed by the same logical address. If a certain process updates the common data under such a situation, the common data of the other processes remain as such in the cache memory and if the other processes make access to the common data, the wrong data is supplied from the cache memory. Means for coping with this "Address Synonym" has conventionally been discussed in the reference "Computing Surveys", Vol. 14, No. 3, September 1982, pp. 510-511. This method includes a table for inversely converting the physical address page to the logical address page. The inverse conversion table is one that holds all the logical address pages mapped to the physical address page of the data existing inside the logical cache for the physical address page. When a certain process updates the common data in this system, the logical address is converted to a physical address by the TLB and then access is made to the inverse conversion table by the physical address. As a result, if the common data of other processes exists inside the cache memory, updating or nullification is made for that data. According to this prior art technique, however, a table for inversely converting the physical address to the logical address is necessary so that the hardware quantity increases and management of the inverse conversion table is complicated.
Another method of solving the problem is discussed in Japanese Patent Laid-Open No. 246850/1986. This prior art reference includes a first directory for storing the logical address information as the address information stored in the logical cache memory and a second directory for storing the actual address information and part of the logical address information. The logical address information and the actual address information are registered to different column addresses of each directory and the relationship between them is determined on the basis of link information stored in the second directory. This system (which will be hereinafter called the "link bit system") will be explained briefly with reference to FIG. 10.
In FIG. 10, reference numerals 901 and 902 represent registers to which the logical address and the physical address are set, respectively, and an address conversion unit 900 converts bits 0-19 of the logical address to bits 0-19 of the physical address. The cache memory consists of a logical directory 903 for storing the logical address information, a data unit 904 for storing data and a physical directory 905 for storing the physical address information. A tag portion of the logical address (bits 0-17) 910 (LAR) corresponding to the data stored in the data unit 904 and the effective bit 911 (V) are registered in the logical directory 903. In this example, the logical directory 903 and the data unit 904 use the bits 18-31 of the logical address 901 as the column address. On the other hand, the column address 913 (hereinafter called the "link bit (LNK)", to which a tag portion of the physical address (bit 0-17) 912 (PAR) corresponding to the data, the effective bit 914 (V) and a logical address tag corresponding to the physical address tag, is registered in the physical directory 905. In this example, the physical directory 905 uses the bits 18-31 of the physical address 902 as the column address. Therefore, the logical address tag 910 and the corresponding physical address tag 912 are not always registered in the same column of each directory.
When the link bit system is employed, the problem of "Address Synonym" can be solved in the following way. If the data shared by the processes exists in the cache and when a write request is issued by a processor to one of the logical addresses, access is first made to the logical directory 903 and the data is written into the corresponding entry. Then, the logical address 901 is converted to the physical address 902 by the address conversion unit 900. Next, access is made to the physical directory 905 using the physical address 902, and the column address of the corresponding data portion 904 is generated by the link information 913 of the hit column Write or nullification is made to this column. The operation described above assures coincidence of the common data. This link bit system is discussed in Japanese Patent Laid-Open No. 25457/1978, too.
However, this link bit system involves the problem that when the data is shared by a plurality of processes, a plurality of link information are necessary for one entry of the physical directory and management becomes complicated.
Next, the problem that occurs in the case of the multiprocessor system will be described. The problem lies in that the function of converting the physical address to a logical address is necessary in order to assure the coincidence of all of the cache memories. As described already, the function of monitoring the common bus for each processor is necessary in order to assure the coincidence of the various cache memories. Though the monitor function has already been described, it is necessary to detect the transaction on the bus to retrieve inside its own cache memory, to nullify if the corresponding block exists and to abort the transaction on the common bus and to execute copy-back processing. According to the structure of the physical common bus by the logical cache memories, however, it is necessary to input the address on the common bus by the monitor, to convert it first to the logical address and then to access the logical cache because the transaction on the common bus is effected by using the physical address. Therefore, an inverse conversion table becomes necessary for converting the physical address to the logical address, so that hardware quantity increases and management of the inverse conversion table becomes complicated.
On the other hand, the inverse conversion table becomes unnecessary if the afore-mentioned link bit system is employed. In FIG. 10, the physical address on the common bus is taken into the physical address register 902 by the monitor. The access is made to the physical directory 905 using this physical address and if the corresponding entry exists, the corresponding data and the column address of the logical directory are generated from the link information 913 and nullification is executed. However, as described already, when a plurality of processes share the data, a plurality of link information become necessary and management becomes complicated.