In modern data processing systems, in order to achieve high performance, architectures are used with several processors interconnected together, with a plurality of shared resources, such as memory modules and data input/output control units (I/O control units), via a common multipoint communication channel referred to as the system bus, the latter constituting a shared resource.
Each unit (processor or input/output control unit) which requires access to another resource (a memory module, another processor, another I/O control unit) in order to receive or send information, first requires access to the system bus, thereby putting itself in competition with other units which also wish to obtain access to the system bus.
Conflicts of access to the system bus are resolved in a well-known manner by an arbitration unit, which in the case of competing requests for access to the system bus from several units, settles the conflict by granting access to a single competing unit at a time, according to a preset priority criteria (e.g. fixed priority, circular priority or round robin, etc.).
Having obtained access to the system bus, the requesting unit to which access was granted can place on the system bus the information required to identify the resource to which it needs access or the destination resource, generally an address, and the type of operation requested, for example reading/writing.
In the case of operations for writing to memory, this information is accompanied by the data item to be written.
For reasons which will be explained later, the resource required in order to execute the requested operation may be temporarily unavailable.
In this case, rather than keeping the system bus busy until the resource becomes free, it is preferred that a RETRY signal be sent to the requesting unit and thereby free the system bus so as to make it available for other transactions.
This signal can be generated by the busy resource, in response to the access request, or by the arbitration unit which acts as a collector of signals indicative of the state of the various resources, as a monitor of their state, and a dispatcher of a RETRY signal to the various requesting units.
Among the causes which may make a RETRY operation necessary are the following:
1. In order to minimize the time for which the system bus is busy, the operations of access to the various resources are set in an input buffer of the resource without waiting for their completion. In particular, it is known that read operations can be executed in two distinct bus busy phases, one for memory addressing and the other for retrieving the data item read.
Since, in general, the execution time for the requested operations (in particular memory read and write operations) is greater than the period between two successive arbitrations, it may happen that a second unit requests access to a resource while the latter is still engaged in executing a previous operation.
The presence of an input buffer only partly avoids this drawback (if the buffer is free the, requested operation can be set even if the resource is busy, however, if the buffer, which can also have several storage levels, is full, setting is impossible).
Hence, there is a conflict of access to the destination resource.
2. Again in order to minimize the time for which the system bus is busy and to improve performance, the processors are generally provided with a fast internal associative memory or cache, of limited capacity, which contains the information of more recent use (or of foreseeable immediate use by the processor) so as to entirely avoid access to the system bus and via the latter to the memory modules if the data requested by the processor are contained in the cache.
In this way, the same data can be contained in one or more caches and simultaneously in memory.
It is therefore necessary to ensure the coherence, or consistency, of the replicated data.
Each processor, via the system bus, must therefore be able to monitor the exchange of data between other units and the memory modules (via the addresses and commands present on the system bus) in order to ensure the consistency of the information contained in its cache with that of the information contained in the cache of the other processors and in memory.
Consistency is ensured by suitable protocols, for example the protocol known by the acronym MESI, via operations for monitoring known as "snooping" the system bus. As a result of and in response to which, the various caches dispatch to the others, over the system bus, response signals which make it possible to ensure the consistency of the data.
Included among these response signals is also a RETRY signal, required for example in many protocols, when a processor requests memory read of an information item which has been modified, is present in one or more caches and is not updated in memory.
In this case, the cache which "holds" the information item has to write the information item to memory, in its updated form, before the requesting processor executes a subsequent attempt to read the same from memory.
Several protocols provide for the direct intervention of the cache, which holds the data item instead of the memory, so that the request can be satisfied. However, in this case, conflicts of access to resources may still arise if the read access request is of the RMW type (read and modify followed in an uninterruptible manner by a write) and the operation of intervention by the cache requires in its turn the updating of memory, and hence the writing thereof.
Finally, if the caches are not provided with standalone snooping tables, the snoop operation can be made dependent on the fact that the cache is not engaged in the execution of other operations and is free.
In short, in multiprocessor systems, the execution of certain operations, with respect to which a unit has obtained access to the system bus, might not be immediately executable, whether through the unavailability of a destination resource required for execution of the operation, through the unavailability of other resources, even if these other resources do not participate directly and actively in the execution of the operation, or for reasons of consistency which make it essential to postpone it and have other operations precede it.
In these cases, execution of the operation has to be reattempted upon presentation of a new request for access to the system bus and a new arbitration.
A first inevitable consequence of this fact is that all the system bus access operations which do not conclude successfully, profitlessly eliminate the possibility of access to the system bus by other units, in the time interval in which the system bus is busy, lessening the availability of the bus and the performance of the other units.
A second consequence is that statistically in a probabilistic manner, direct requests for access to a single resource or indirectly requesting the collaboration of other resources for snooping operations, may accumulate over time, as a result of repeated RETRY attempts, thereby virtually creating a lockup condition. This is defined as "livelock" since, unlike the "deadlock" condition, the processors involved continue to repeat and reattempt the same operations in competition with one another, profitlessly in turn causing the system bus to be busy.