The present invention relates to a processor and a multiprocessor technology and, more particularly, to a technology that is applied advantageously to error handing and so on in a tightly-coupled multiprocessor system which allows a plurality of processors to share main memory.
In the field of a information processing technology, a multiprocessor technology is known which divides the processing load among a plurality of processors to increase per-unit-time information processing capabilities and to ensure fault tolerance achieved through processor multiplexing.
When an error occurs in a processor during multiprocessor system operation and the execution of the processor cannot be continued any more, the processor, if duplexed, has an alternate processor that starts re-executing the processing to reduce the possibility of a system failure. However, when each processor of a tightly-coupled multiprocessor system, such as the one shown in FIG. 6, is executing its own program, and especially when the write-back caching mode is used to control the cache, disconnecting a processor, where an unrecoverable error occurred, from the system prevents updated (dirty) data in the cache of the processor from being reflected on the main memory. In this case, it is necessary in most cases to bring the system down because of data inconsistency.
To avoid this technical problem, Japanese Patent Laid-Open Publication JP-A-10-105527, for example, discloses a method which prevents the system from going down. To do so, an external cache and a special controller are provided in the processor bus to include therein all the data stored in the internal cache memories in the processors to always make the most recent data visible to all the processors. However, this method requires extra components, such as the external cache and the controller, involves extra costs and, in addition, does not solve the problem as the number of processors increases.
Because more and more processors will be included in a multiprocessor system in future to improve system performance, a need exists for a low-cost method for preventing a system failure.
A tightly-coupled multiprocessor system must be stopped when an error occurs in one of the processors and instruction execution cannot be continued any more. On of the reasons for stopping the system is that, when the processors employ the write-back caching mode and the cache of the processor where the error occurred contains the most recent data not yet reflected on the main memory, disconnecting the processor from the system prevents the most recent data from being passed to the main memory and other processors.
Another reason is that, if the processor where the error occurred cannot respond to a request from a normal processor, there is a possibility that the normal processor will be stopped.
It is an object of the present invention, for use in a tightly-coupled multiprocessor system composed of a plurality of processors each containing a write-back cache memory, to provide a low-cost processor which has no extra function added to it and which prevents an error in one of the processors from causing the system to go down.
It is another object of the present invention, for use in a tightly-coupled multiprocessor system, to provide a processor which has no extra function added to it and which minimizes the effect of snoop processing.
The present invention provides an error processing technology for use in a multiprocessor system. When a fatal error occurs and instruction execution cannot be continued in one of the processors, the processor checks the error level. Although the instruction in that processor cannot be processed, the processor continues operation as long as snoop response processing, such as invalidating cache lines (purging) or writing dirty lines into the main memory, can be executed. This allows other normal processors to continue operation and prevents the system from stopping.
That is, according to present invention, each processor of a tightly-coupled multiprocessor system comprises means for checking the location and the level of an error that occurred in the processor and means for determining, although instruction execution cannot be continued, whether or not snoop processing, such as purging or moving dirty lines into main memory, can be continued.
More specifically, when an error occurs in a processor cache composed of a plurality of hierarchical levels, the processor according to the present invention checks if the cache at a hierarchical level at which data coherence is maintained works properly (a cache hierarchical level at which dirty lines are held). In most cases, the cache at the lowest level uses the write-back caching mode, and the caches at higher levels use the write-through caching mode. In this case, even if an error occurs in a cache at some other hierarchical level, means for disconnecting the cache at that level from the snoop operation is provided to allow the processor to respond to a snoop request sent via the bus.
When an error occurs in the processor, the means described above checks the location and the level of the error. If instruction processing cannot be continued, the means checks if the snoop response processing described above can be performed. If the snoop response processing can be continued, the processor responds to a snoop request sent from the bus even when instruction execution in that processor cannot be continued, thus preventing the system from going down.