In data processing systems that make use of multiple processors it is often desirable to permit more than one processor to share access to a resource such as a memory location. The shared memory locations can then be used as a mechanism for communicating information between the processors, for example.
It is usually desirable to share access in such a way as to avoid corrupting the contents of the memory location when both processors attempt to write to it at the same time. Therefore, most multiprocessor system include some type of mechanism to avoid these write conflicts.
One technique to avoid interference among processors is to control the exact order in which each processor may issue instructions that access the shared location. However, present high speed processors typically use instruction scheduling techniques which may reorder, on-the-fly, an originally programmed instruction sequence. By allowing instruction reordering, a processor can use of sophisticated multibank cache memories, bypassed write buffers, write merging, and pipeline processing techniques. In such a system, a sequence of reads or writes issued by one agent, as viewed by another agent, may be arbitrarily reordered in a way which cannot be predicted in advance. Therefore, the observance of strict ordering rules by the program itself can be impossible in such systems.
Because of the possibility of instruction reordering on-the-fly, if strict ordering of memory accesses must be maintained between multiple processors, explicit memory barrier instructions must typically be included within the instruction set of the processors. These instructions are used to avoid a situation where two or more processors are attempting to obtain write access to the same location in memory at the same time.
One such interlocking primitive is a reduced instruction set computing (RISC) style load-locked, modify, store-conditional sequence. The semantics of these instructions are such that the load-locked instruction first obtains a read only copy of the block. If no other agent has written to the block between the time of the load-locked and store-conditional, the store-conditional instruction is allowed to update the block. Otherwise, the store-conditional instruction fails. In either case, the store-conditional instruction returns a status flag indicating whether or not it succeeded. If the store-conditional instruction fails, the program must eventually branch back and retry the sequence. This style of interlocking primitive has been found to scale very well with the speed of a cache, and therefore is presently an attractive approach for implementing shared memory access in multiple processor systems.
In the past, these interlocking primitives have typically been implemented by using a lock register consisting of a lock valid flag and a lock address. Upon receiving a load-locked instruction, the lock valid flag is set and the lock address register is stored with an address indicating the range of locations corresponding to the locked block. Upon the receipt of a store-conditional instruction, the lock address register is checked. If the lock valid flag is still set, then the store-conditional instruction is allowed to succeed. Otherwise, the store-conditional instruction fails.
The lock valid flag may be controlled by using any number of techniques. For example, so-called invalidating probe command logic is commonly implemented in multiprocessor system hardware. If an invalidating probe command is available in the processors, it can be used to manipulate the lock valid flag. For example, each processor can simply include logic or a microprogram which clears the lock valid flag whenever an invalidating probe command issued by another agent matches the address stored in the corresponding lock register. This is typically done when another agent has stored data at the locked address during the pendency of a load-locked instruction.
Multiprocessor systems which use the load-locked store-conditional instruction primitive therefore typically need to maintain copies of lock registers and lock address registers for each memory block for which locking is desired.