1) Field of the Invention
The present invention relates to a multiprocessor connected to a plurality of processors via a system bus. Particularly, the invention relates to a multiprocessor and a method of accessing the same, each applied to a distributed and shared memory system embodying the directory mode.
Furthermore, the present invention relates to a data transfer system that exchanges data via a bus, a transmitter and receiver in the data transfer system, and a bus control method for the data transfer system.
2) Description of the Related Art
FIG. 20 is a block diagram illustrating a multiprocessor. In the multiprocessor 100 shown in FIG. 20, a plurality of processor modules 101-1 to 101-n (n is an integer of 2 or more) are mutually connected via a system bus 102.
For instance, as shown in FIG. 21, each of the processor modules 101-1 to 101-n includes at least one processor element 103, a memory module 104 and a system bus interface module 105.
The respective processor elements 103 are connected to the memory module 104 via a cache snoop bus 106. The memory module 104 is connected to the system bus interface module 105 via the internal bus 107.
Each of the processor elements 103 consists of a processor 103a, a cache memory 103b and a cache control module 103c.
The processor 103a performs a desired information process by gaining a read/write access to data with the cache memory 103b or the memory module 104. The cache memory 103b is arranged corresponding to the processor 103a to execute a high-speed read/write access between the processor 103a and the memory module 104.
The processor 103a includes a timer 108 that counts an elapsed time (response wait time) till a response signal is received after an access signal is transmitted to the memory module 104. When receiving no response signal even after an elapsed time counted by the timer 108 has exceeded a predetermined time-out time, the processor 103a outputs a time-out signal.
The cache memory 103b stores data in cache lines of, e.g. 256 bytes. The cache line is formed of four sublines (64 bytes).
The cache control module 103c controls data read/write operation between the processor 103a and the cache memory 103b in the copy back mode.
Further, the memory module 104 is shared by respective processor modules 101 via the cache snoop bus 106. The memory module 104 consists of a main storage memory that stores various kinds of data, a directory memory 104b and an access control module 104c.
The directory memory 104b stores the registration status of the main storage data into the cache memory 103 arranged in each of processor elements 101-1 to 101-n within the multiprocessor 100.
The access control module 104c stores controllably the registration status of the main storage data to the cache memory 103 into the directory memory 104b, based on a value in the cache memory 103b and a value in the main storage element 104a.
For example, under control of the access control module 104c, the main storage status of the cache memory 103b is stored into the directory memory 104b in cache lines formed of four sublines when the value in the cache memory 103b and the value in the main storage element 104a are in the same shared state. Further, the value in the main storage element 104a is not updated when the processor 103a rewrites the value in the cache memory 103b. The status of the cache memory 103b and the latest value holding destination are stored into the directory memory 104 in sublines in the case of the dirty state where the value in the cache memory 103b is the latest value and the value in the main storage element 104a is an old value.
Further, in the case of one writing access or cache-off access, the access control module 104c issues cache invalidation commands for the sublines forming the cache line in the corresponding cache memory 103b.
The system bus interface module 105 interfaces the memory module 104 with the system bus 102.
FIG. 24 is a block diagram illustrating the function of the above-mentioned multiprocessor 100. Each of the processor modules 101-1 to 101-n multiprocessor 100 shown in FIG. 24 includes a processor (CPU) 103a, a cache unit (CS) 103A, a directory unit (DIR) 104A, a main storage element (MS) acting as a main storage memory and a system bus interface module 105 acting as a system bus control mechanism (in FIG. 24, a plurality of processor elements 103, a cache snoop bus 106 and an internal bus 107 are not illustrated).
The cache unit 103A consists of the cache memory 103b and the cache control module 103c, shown in FIG. 21. The directory unit 104A consists of the directory memory 104b and the access control module 104c, shown in FIG. 21.
In the memory module 104 with such an arrangement in the multiprocessor 100 shown in FIGS. 20 and 21, the following cache invalidating (purging) process is executed when the cache line is in a shared state in one write access or cache-off access operation.
That is, when the subline which includes the access address to the main storage module 104 and which is needed to invalidate the cache memory 103b corresponds to the subline with the serial number 0, the access control module 104c refers to the directory memory 104b for the directory entry including the subline with the serial number 0.
In that case, when the corresponding cache line is in a shared state, the access control module 104c issues a cache invalidation command as an invalidation command to the cache memories 103b corresponding to all reference destinations stored in the directory memory 104b, together with the access address.
That is, as shown in FIG. 22, for example, of four sublines (with serial numbers 0 to 3) forming the cache line in the cache memory 103b to be refereed, the subline with serial number 0 receives the cache invalidation command CI (see the time t1 to t2). In other words, the cache invalidation command CI transmitted from the access control module 104c is issued only to the subline with serial number 0 on the cache snoop bus 106 to be referred.
In this case, since the shared state management on the directory memory 104b, that is, the storage of the destination where a value stored in the main storage memory 104a is referred to is carried out in cache lines, the shared state of the directory memory cannot be deleted according to only the success or failure of one cache invalidation to one subline. In consequence, the cache status notification command ASK is issued to the remaining sublines (with serial Nos. 1 to 3) forming the cache line (refer to the time t2 to t5).
The cache memory 103b that receives the cache invalidation command CI and the cache status notification command ASK responds to the access control module 104c with the success or failure of invalidation after constant cycles (refer to the time t5 and t6) and then provides the current cache status as the cache status notification CST in response to the cache status notification command ASK (refer to the time t6 to t9).
In the access control unit 104c,until all the shared states on the directory memory 104b are deleted based on the cache status notification command CST, cache invalidation commands are sequentially issued to the sublines forming the cache line by starting from the subline with serial No. 0.
Thereafter, with the cache invalidation established and all the sublines in no shared states forming the cache line, the access module 104c deletes the content of the cache memory 103b of a reference destination stored in the directory memory 104b.
Furthermore, for example, when the read access performed from the processor 103a to the cache memory 103b results in a cache mishit, the memory error process shown with the flowchart in FIG. 23 is performed.
First, when the read access from the processor 103a results in a cache mishit, the cache control module 103c issues a main storage memory read access to the memory module 104 via the cache snoop bus 106 (step A1).
In the access control module 104c within the memory module 104, the directory of a cache line including the access address is referred to from the directory memory 104b (step A2).
If the status of the cache memory 103b is in a dirty state, the reading operation of the main storage element 104a starts after the cache coherent process (steps A3 and A4) while the status of the directory memory 104a is updated (step A5). Then transferring data to be read onto the cache snoop bus 106 starts (step A6).
If the cache memory 103b is in a shared or invalid state (where the cache line is not in a dirty state or in a shared state), reading the main storage element 104a starts while the status of the directory memory 104b is updated (steps A3 and A5). Then transferring data to be read onto the cache snoop bus 106 is set in motion (step A6).
When transferring data to be read onto the cache snoop bus 106 starts, the cache control module 103c starts registering the readout data to the cache memory 103b (step A7).
If any multibit error is not detected in the readout data, the reading process is normally ended (from step A8 to step A9 via NO route).
However, if a multibit error is detected in the readout data, the access control module 104c in the memory module 104 halts transferring the readout data onto the cache snoop bus 106 (from step A8 to step A10 via YES route) and then sends back a failure occurrence response to the cache control module 103c (step A12).
Furthermore, in the cache control module 103c, the registration operation to the cache memory 103b is invalidated in response to the cancellation of transferring the readout data onto the cache snoop bus 106 (step A11) while an access error response is sent back to the processor 103a in response to the failure occurrence response from the access control module 104c (step A13).
Thereafter, the access control module 104c rewrites the status of the directory entry in the directory memory 104b into that in the initial state before accessing (step A14).
Further, the memory access from the processor 103a to the main storage memory 104a in the memory module 104 is made according to the command sequence, for example, shown in FIG. 25. FIG. 25 shows the case where memory access is made from the processor 103a in the processor module 101-1 to the main storage element 104a in the processor module 101-2. This situation is applicable to memory access operations between other processor modules or in a processor module.
For the sake of convenience in the following explanation, the processor module PM 101-1 as an access source to the main storage element 104a is called a local PM; the processor module 101-2 existing in the main storage element 104a to be accessed is called a home PM; and the processor module 101-n cashing the main storage element 104a is called a remote PM.
Here the local PM 101-1 tries to have non-cache-store access to a memory address "A". However, the entity of the memory address "A" exists in the home PM 101-2 while it is in a reference state in the cache memory 103b in the remote PM 101-n.
In that case, first, the local PM 101-1 issues non-cache memory-store command M-ST to the directory unit 104A in the home PM 101-2 (refer to (1) in FIG. 25) while the timer 108 starts counting the elapsed time.
Successively, in the directory unit 104A within the home PM 101-2 which has received the M-ST command, the access control module 104c retrieves the directory memory 104b.
In this case, since the memory address "A" is cached by means of the remote PM 101-n, the access control module 104c in the directory unit 104 issues the cache invalidation command (cache purge command) to the cache unit 103A in the remote PM 101-n (refer to (2) in FIG. 25).
Thereafter, after the cache unit 103A in the remote PM 101-n which has received the cache purge command C-PG executes a cache purging process, it issues a cache purge completion response command PG-CMP to the home PM 101-2 being a C-PG command issuing source (refer to (3) in FIG. 25).
The home PM 101-2 which has received the PG-CMP command writes data designated by the M-ST command from the local PM 101-1 to the memory address "A" of the main storage element 104a.
After the access control module 104c in the home PM 101-2 changes the content of the directory memory 104b from the reference state to the non-cache state in the remote PM 101-n, it issues the memory store completion response command ST-CMP to the local PM 101-1 being a M-ST command issuing source (refer to (4) in FIG. 25).
If not receiving the response signal even when the elapsed time counted by the timer 108 exceeds the time-out time, the processor 103a in the local PM 101-1 notifies of an access time-out.
The predetermined time-out time T counted by the timer 108 is set based on the time taken in the command sequence.
In concrete, it is assumed that the maximum period of time between the time the processor 103a in the local PM 101-1 issues the M-ST command and the time it receives the ST-CMP is T1 and that the maximum period of time between the time the processor 103a issues the C-PG command and the time it receives the PG-CMP command is T2. In such a case, the unique time T satisfying the condition expressed by the following formula (1) is set as a time-out time. EQU T1&gt;T1+T2 (1)
FIG. 26 is a block diagram illustrating a data transfer system embodying a split bus which can be used as the system bus 102, the cache snoop bus 106 or the internal bus 107 in the above-mentioned multiprocessor 100. In the data transfer system 200 shown in FIG. 26, the transmitter 210 is connected to the receiver 220 via the split bus 201.
Here, the split bus 201 permits a plurality of concurrent processes by transferring the bus use right to another processor until the transmitter 210 receives a response after sending a request to the receiver 220, thus intending its high bus-use efficiency.
Further, the transmitter 210 consists of a CPU such as the processor 103a in the multiprocessor 100 described before and an input/output device (I/O device). The transmitter 210 includes a send queue control unit (send queue control) 211, a bus control unit (bus control-1) 212, an output selector (out selector) 214, and four send command/data queues (squeuel to squeue4) 213-1 to 213-4 each formed of flip-flops (FF) and random access memories (RAMs).
In response to a data transfer request generated from the internal of the transmitter 210, the send queue control unit 211 controllably stores the corresponding commands and data to free send command/data queue among the send queue command/data queues 213-1 to 213-4 and requires the bus use right to the bus control 212.
Further, the bus control unit 212 produces a bus-use-right capture request to the bus to an arbitrating device (not shown) that controls the use status of the split bus 201, based on the bus use request from the send queue control unit 211.
The bus control 212 outputs a select signal to the output selector 215 when the bus use right is captured and then outputs a select signal which selects any one of command/data packets (bus commands with data) stored in four command/data queues 213-1 to 213-4.
The output selector 214 selects any one of the command/data packets stored in four command/data queues 213-1 to 213-4 by means of a predetermined algorithm and then sequentially sends out the command and data forming the selected packet to the split bus 201.
A data transfer request which includes a memory write command to the receiver 220 acting, for example, as a main memory can be handled as a data packet with a command from the transmitter 210 acting as the CPU. The command which forms a data packet with a command such as the memory write command is formed of a format, for example, as shown in FIG. 27.
That is, the memory write command 300 shown in FIG. 27 is formed of a command field (Command) 301, a command ID field (Command-ID) 302, a transmitter address field (Source-Address) 303, a remote device address field (Destination-Address) 304, a size field (Size) 305, and a memory address field (Mem-Address) 306.
The command field 301 designates the operation code of a command. The command ID field 302 designates the identification number to identify a multicommand. In the transmitter 210 shown in FIG. 24, the level of multiplexing is "4".
Further, the transmitter address field 303 designates a transmitter address. The remote device address field 304 designates a remote device address. The size field 305 designates the data size of a data transfer command. The memory address field 306 designates the memory address in memory access.
The receiver 220 is formed of, for example, a memory device such as the memory module 104 in the multiprocessor 100. The receiver 220 includes a command decoder (dec) 221, a receive queue (rqueues 1 to 4) 222-1 to 222-4, a bus control unit (bus control-2) 223, a receive queue control unit (rcev queue control) 224, a memory selector (mem-selector) 225, and a memory control (mem control) 226.
The command decoder 221 checks a command packet received via the split bus 201 for the remote device address field in the remote device address field 304 and for free queues in the receive queues 222-1 to 222-4.
In concrete, when judging that the received command packet is a command for the system including the decoder 221 itself and that any one of the receive queues 222-1 to 222-4 is free, the command decoder 221 notifies the bus control unit 223 of the event. If the command packet is not a command for the system including the decoder 221 itself, the command decoder 221 ignores the event. If the receive queues 222-1 to 222-4 are occupied, the command decoder 221 notifies the transmitter 210 of the event.
In response to the notification from the command decoder 221, the bus control unit 223 stores controllably the command and the successive data received in a free queue among the receive queues 222-1 to 222-4. When the command and the successive data have been completely stored into the free queue, the bus control unit 223 asks the receive queue control unit 224 to subject the command stored in the free queue to a memory access control.
In response to a request for memory access control from the bus control unit 223, the receive queue control unit 224 produces a select signal which selects to the memory selector 225 any one of memory access commands stored in the receive queues 222-1 to 222-4.
When receiving the select signal from the receive queue control unit 224, the memory selector 225 selects any one of memory access commands stored in the receive queues 222-1 to 222-4, thus outputting it to the memory control unit 226. In addition, the memory selector 225 outputs the successive data to a memory such as the main storage element 104a shown in FIG. 21. The memory selector 225 is formed of, for example, FIFO (First In First Out) memory.
In such an arrangement, the data transfer system 200 shown in FIG. 26 transmits a command packet with data including a memory write bus command from the transmitter to the receiver 220 via the split bus 201.
When the command packet with data is transmitted from the transmitter 210, data packets are transferred by one operation, in succession to packets in which bus command information including the transfer operation code and the transfer data size are defined.
In concrete, when a data transfer request occurs from the inside of the transmitter 210, the send queue control unit 211 searches the four send command/data queues 213-1 to 213-4 for free send queue/data queues and stores the command and data into the corresponding queue.
At the time the use of bus is prepared by storing the command and data into the free send command/data queue, the send queue control unit 211 hands the request to the bus control unit 212. The bus control unit 212 selects a specific bus request among the four bus requests stored in the send command/data queues 213-1 to 213-4 according to a predetermined algorithm and then outputs the request, req, to the bus arbitrating device (not shown) to obtain the use right of the split bus 201 (refer to the time u1 to u3 in FIG. 28).
When obtaining the use right of the split bus 201 in response to the bus use permission signal, grt, from the bus use right arbitrating device (refer to the time u2 and u3 in FIG. 28), the bus control unit 212 transmits sequentially the command/data packet selected by the send data control unit 212 to the split bus 201 via the output selector 214 (refer to the time u3 to u10 in FIG. 28).
In response to the leading command packet forming a command/data packet from the split bus 201 (refer to the time u3 and u4 in FIG. 28), the receiver 220 checks the command decoder 221 for the device address and the like.
When the command decoder 221 judges that the received command packet is the command for the system including the command decoder 221 itself and that any one of the receive queues 222-1 to 222-4 is free, it informs the bus control unit 223 of the event, thus storing the received command and the successive data into the free receive queue (refer to the time u4 to u11 in FIG. 28).
The receive queue for no system including the corresponding command decoder 221 itself is ignored. When the receive queues 222-1 to 222-4 are not free, the command decoder informs the transmitter 210 being the transmitting source of the event.
On completion of the bus receiving operation, the bus control unit 223 asks the receive queue control unit 224 for the memory access control of commands stored in the receive queues 222-1 to 222-4.
The receive queue control unit 224 outputs a select signal to the memory selector 225 and then selects the memory access commands stored in the four receive queues 222-1 to 222-4 via the memory selector 225 according to a predetermined algorithm, thus indicating the memory control unit 226 to gain actual memory access.
As for the data transfer system 200 shown in FIG. 26, there has been a tendency to improve the data transfer rate by shortening the bus operational clock cycle and by adopting a wide bus band of 64 or 128 bits.
In order to improve the data transfer rate in the data transfer system, it is necessary that the bus use efficiency is increased in the split bus. For this purpose, it is effective to include bus commands and receive buffers as many as possible.
However, when write access or cache-off access is sequentially executed to one cache line, a plurality of sublines forming the cache line may be shared by another processor. In such a case, the multiprocessor 100 shown in FIGS. 20 and 21 issues cache invalidation commands for the sublines forming the cache line to perform the capture of write right or the cache-off by the completion of the process to the cache line.
That is, as shown in FIG. 22, when the access control module 104c gains one write access or cache-off access, it is needed to issue the cache invalidation commands stored in the corresponding cache memory 103b (shown in FIG. 22) by the number of the sublines forming the cache line.
There is the problem that the bus use right capture and the invalidating command process are needed every access because the invalidation command of the cache memory 103b uses a plurality of buses forming the memory system are needed, so that the process time prolongs.
In addition, as to the multiprocessor shown in FIGS. 20 and 21, since the memory space of the main storage element 104a is shared by the plurality of processors 103a, another processor may access the address space of the main storage element 104a even if a hard error of the main storage element 104a has detected through access of a processor.
In that case, the problem is that the process shown in FIG. 23 must be performed for each access to the main storage element 104a and that the process cycle in which an error response is sent back to the access issuing source processor is undesirably prolonged.
Particularly, there is the problem that since the inevitable access to the main storage element 104a with a relatively-long response cycle occurs an access conflict in the memory module 104, the access performance to a normal main storage element 104a is deteriorated.
Further, when the cache coherence between the main storage element 104a and the cache memory 103b must be held after an error occurrence, it is needed to regain the original state by canceling the status write of the directory memory 104b and the registration of readout data to the cache memory 103b at the error occurrence time. Hence, there is the problem that the process becomes complicated and that a new storage element is needed to hold the initial state until the access completion time.
Further, in the multiprocessor shown in FIGS. 20 and 21, since the local PM 101-1 gains a memory access to the home PM 101-2 as shown in FIG. 25, the timer 108 in the local PM 101-1 as the access source counts a single response wait period of time. Hence, there is the problem that when a plurality of command errors occur simultaneously during one memory access wait time, as shown in FIG. 29, it is difficult to accurately specify the failure occurrence spots.
Particularly, it is essential to specify accurately the failure occurrence spot because the high-reliability system such as a fault tolerant system require a reliable recovery even at a failure occurrence time.
That is, as shown in FIG. 29, the local PM 101-1 issues non cache memory store command M-ST for the memory address "A" to the home PM 101-2 (refer to (1) in FIG. 29) while the timer 108 starts counting the access timer value (time-out time) T (refer to the time v1 in FIG. 30).
The access control module 104c in the home PM 101-2 which has received the M-ST command from the local PM 101-1 retrieves the directory memory 104b, and recognizes the memory address "A" cached by means of the remote PM 101-n, and issues the cache purge command C-PG to the remote PM 101-n (refer to (3) in FIG. 29).
As for the processor 103a within the home PM 101-2, as soon as the timer 108 transmits the cache purge command C-PG, the access timer T2 starts counting the timer counting operation (refer to the time v3 in FIG. 30). At the same time, the directory memory 104b in the home PM 101-2 becomes a busy state in lines including the memory address "A".
Further, the remote PM 101-n which receives the C-PG command from the home PM 101-2 starts gaining a write access to the memory address "A" cached, preceding the transmission of the cache purge command C-PG. The remote PM 101-n also issues the cache update notification command M-EX to the access control module 104c by which the directory memory 104b in the home PM 101-2 is updated (refer to (2) in FIG. 29).
In the processor 103a arranged in the remote PM101-n, the timer 108 transmits the M-EX command while the access timer T starts the timer counting operation (refer to the time v2 in FIG. 30).
The remote PM101-n transmits the M-EX command and then receives the C-PG command from the home PM101-2. However, since the remote PM 101-n is executing the same line caching operation, the purge process is pending within the remote PM 101-n.
It is now assumed that the home PM101-2 which has receives the M-EX command retrieves the directory memory 104b and then tries to inform that the line related to the memory address "A" is in a busy state, but some factor disturbs sending the response to the remote PM 101-n being an access source (Disappear; refer to (4) in FIG. 29).
In that case, since the purge process is held in the remote PM 101-n, the remote PM101-2 cannot respond to the M-ST command.
That is, because the response to the M-EX command disappears, this status continues until the response wait time which is counted by the timer 108 in the remote PM 101-n (the time T elapsed from transmission of the M-EX command) becomes time-out.
As expressed by the formula (1) T&gt;T2, the access timer value T2 corresponding to a response wait time to the C-PG command counted by the timer 108 in the home PM 101-2, becomes time-out prior to the M-EX response wait timer time T in the remote PM 101-n (refer to the time v4 in FIG. 30).
By acknowledging the time-out detection to the M-ST command from the local PM 101-1 being the C-PG command generating source, the home PM 101-2 which has detected the time-out performs completely the M-ST command process (refer to time v5 in FIG. 30).
When detecting the time-out of the M-EX command (refer to the time v6 in FIG. 30), the remote PM 101-n informs the processor 103a being an access source therein of the event.
It can be specified that the time-out of the M-EX command detected in the remote PM 101-n has caused by some factor lying between the remote PM 101-n and the home PM 101-2. However, the cause of the time-out of the C-PG command detected in the home PM 101-2 cannot be specified.
That is, although no troubles exist between the home PM 101-2 and the remote PM 101-n, the time-out of the C-PG command has occurred due to no response to the preceding M-EX command. However, the home PM 101-2 cannot specify that event as the cause.
As described above, in the data transfer system 200 shown in FIG. 26, it is needed to improve the bus use efficiency of the split bus to increase the data transfer rate. For the countermeasures, it is effective to prepare the receive buffers for bus commands and data as many as possible.
However, in the data transfer system 200, it is difficult to sufficiently increase the capacity of the receive-queue acting as a receive buffer to improve the bus use efficiency. As a result, there is the problem that a sufficient number of receive buffers cannot be prepared and that the advantage of the split bus cannot be utilized sufficiently.