The present invention relates to an effective technique to be applied to a multiprocessor system configuration method for carrying out consistency control of a cache memory and a cache consistency guaranteeing method wherein a multiprocessor system has a plurality of processors and a cache memory per one processor or more, and more particularly, wherein a sharing memory type multiprocessor system has a plurality of nodes which have respective processors and share a memory through a network.
Conventionally, a symmetrical multiprocessor (hereinafter referred to as an SMP) having a plurality of processors where shares a memory space is often used as a computer for simultaneously multi-processing a plurality of processing requests for a sharing resource such as a transaction processing or a large scale database processing. On the other hand, in a recent processor has an operating frequency thereof has a high speed. In order to solve the problem of a deterioration in performance due to an access time of a main storage (hereinafter referred to as a memory) constituted by a DRAM which is an element having a large capacity and a low speed, processors having a cache memory with a small capacity and a high speed are increased. In the SMP constituted by using a plurality of processors having such a cache memory, consistency between cache memories should be guaranteed. In a bus coupling type SMP, for example, there is used such a method that a memory reference request sent from each of processors is monitored by all the other processors and, thereby, consistency between cache memories is guaranteed. The method is referred to as a xe2x80x9csnoop bus methodxe2x80x9d (cited reference 1: see xe2x80x9cParallel Computer Architecturexe2x80x9d ISBN 1-55860-343-3, pp 277 to 301).
In such a snoop bus method, memory reference requests are transmitted from all processors through a snoop bus to a memory. Therefore, the snoop bus becomes a bottleneck of a system. As a method for decreasing the number of request issues sent from each of the processors to the snoop bus with a memory access, generally, a xe2x80x9cwrite back methodxe2x80x9d is used. However, even if the number of processors is to be increased to enhance the performance of the SMP of the snoop bus method, an electrical load to be applied to one bus is increased. Therefore, the maximum number of processors is limited. As a method of further increasing the number of processors, there is often used a xe2x80x9cswitch coupling type SMPxe2x80x9d for coupling each of the processors by means of a cross bus switch or the like in place of the bus. In such a switch coupling type SMP, there is used a xe2x80x9cswitch broadcasting methodxe2x80x9d for broadcasting a memory reference request sent from a certain processor through the cross bus switch to all processors in order to take over a feature of the snoop bus, that is, the feature being xe2x80x9call processors monitor a memory reference request sent to a busxe2x80x9d (cited reference 2: see xe2x80x9cParallel Computer Architecturexe2x80x9d ISBN 1-55860-343-3, pp 555 to 556).
On the other hand, an I/O device such as a disk device or a network interface, and a processor share a memory, thereby exchanging data. For example, in the case in which a file is to be read from the disk device, the processor addresses a memory (referred to as a buffer) for storing the data read out and activates a DMA write for the disk device. The disk device reads a file recorded in a disk and writes data to the addressed buffer. At this time, if the consistency guarantee of a processor cache is not carried out for data write from the disk device, the processor refers to old data in the cache memory despite the update of contents of the memory through the disk device. As a method for solving this problem, for example, there is used a xe2x80x9csnoop type coherent I/O methodxe2x80x9d applying the above-mentioned xe2x80x9csnoop bus methodxe2x80x9d to a memory access sent from the I/O device, or an xe2x80x9cexplicit flash methodxe2x80x9d for explicitly flashing the contents of the processor cache before the processor carries out DMA activation for the I/O device (cited reference 3: see U.S. Pat. No. 4,713,755 xe2x80x9cCache Memory Consistency Control with Explicit Software Instructionsxe2x80x9d.
In the SMP using the switch broadcasting method described above, however, the following problems arise from the application of the snoop type coherent I/O method. In the switch broadcasting method, a memory reference request sent from the I/O device must be broadcast to all processors by means of a switch in order to guarantee the cache consistency of all the processors in the switch broadcasting method. However, the broadcast of the I/O device through the memory reference request disturbs the memory reference request of the processor. Therefore, the memory reference of the processor is delayed so that there is the drawback that whole performance thereof decreases. Moreover, a cache becomes busy due to the execution of consistency guarantee check of the caches of all the processors through the broadcast. Consequently, a cache access sent from each of the processors is inhibited so that there is the drawback that a cache access latency thereof increases.
Furthermore, in the case in which the xe2x80x9cexplicit flash methodxe2x80x9d is to be applied, it is considered that the following problems arise. The explicit flash method utilizes the feature, xe2x80x9ca buffer region which an I/O device accesses is defined before DMA activation is carried out in a processorxe2x80x9d, and, in order to previously guarantee that a copy in the buffer region is not present in all caches, broadcasts a flash request to all processors through a switch only in this buffer region. In the processor receiving the flash request, if the state of the cache is xe2x80x9cupdatedxe2x80x9d, the newest contents are written back to the memory and the cache is set to be xe2x80x9cinvalidxe2x80x9d because the contents of the cache is the newest. If the state of the cache is not xe2x80x9cupdatedxe2x80x9d, the cache is simply xe2x80x9cinvalidatedxe2x80x9d. Referring to the DMA access sent from the I/O device, consequently, it is not necessary to carry out the broadcast for the consistency guarantee of the cache. In the present method, however, it is necessary to successively execute the explicit flash and the memory access through the I/O device. For this reason, there is the drawback that file access time is prolonged and system performance thereof accesses, for example.
Therefore, an object of the present invention is to provide a multiprocessor system capable of reducing a broadcast for cache consistency control for a memory access sent from an I/O device and implementing a high-speed I/O processing. In order to achieve the object, the present invention has a first problem to reduce a broadcast for cache consistency control related to a memory read request sent from an I/O device. Furthermore, the present invention has a second problem to reduce a broadcast for cache consistency guarantee related to a memory write request sent from the I/O device.
The above and other objects and novel features of the present invention will be apparent from the description and accompanying drawings in this specification.
The summary of the typical invention disclosed in the present application will be briefly described below.
In order to attain the first object, a multiprocessor of the present invention comprises a first means for recording one of both an identifier of the cache memory if the cache memory has an exclusive copy of a memory location capable of being cached and the report that no cache memory, otherwise, has the exclusive copy, wherein when one of the processor and the I/O device issues a read request for the memory location capable of being cached, the first means carries out one of: a first step of, if the identifier is recorded, transmitting a message for determining whether or not only the cache memory with the exclusive copy has a xe2x80x9cupdatedxe2x80x9d copy, and carrying out one of, when the cache memory with the exclusive copy has a xe2x80x9cupdatedxe2x80x9d copy, supplying data from the cache memory with the exclusive copy and of, otherwise, reading data from the memory; a second step of, if the report is recorded, reading data directly from the memory; and a third step of, if the identifier is recorded and a cache memory other than the cache memory with the exclusive copy has a xe2x80x9cupdatedxe2x80x9d copy, transmitting a message for determining whether or not all of the cache memory have xe2x80x9cupdatedxe2x80x9d copies, and carrying out one of, when at least one of all of the cache memory has a xe2x80x9cupdatedxe2x80x9d copy, supplying data from the at least one of all of the cache memory and of, otherwise, supplying data from the memory.
In order to attain the second object, the present invention of the present invention comprises a first means for recording a write unit to the memory per the respective I/O devices; and a second means for, when the I/O devices carry out a memory write to a memory block containing a plurality of cache lines, examining whether or not the memory write unit of the respective I/O devices is recorded in the first means, wherein if the memory write unit is recorded, the second means carries out the steps of: broadcasting a request for invalidating all caches relative to a continuous region shown by the write unit recorded in the first means from a starting address of the memory block to all the caches; invalidating the cache memory receiving an invalidation request of the caches when the cache memory receiving the invalidation request has a copy corresponding to the continuous region; and directly reading the cache to data for the memory block after all the caches memory are completely invalidated.
Effects obtained by the typical invention disclosed in the present application will be briefly described below.
According to the multiprocessor system of the present invention, it is possible to reduce a broadcast for cache consistency control related to a memory read request sent from the I/O device, and furthermore, to reduce a broadcast for cache consistency guarantee related to a memory write request sent from the I/O device. As a result, it is possible to reduce a broadcast for cache consistency control related to a memory access sent from the I/O device, thereby implementing a high speed I/O processing. Moreover, it is possible to reduce a memory reference latency through a processor to improve the performance of the whole system by reducing a broadcast to all the nodes.