The present invention relates generally to systems and methods for executing processor instructions, and more particularly to systems and methods for executing locked memory updates using two or more processor instructions.
Most processor instruction-set architectures have been optimized for uniprocessor applications. Uniprocessors are computer systems using a single processor, but also containing main memory and input/output (I/O) devices. Uniprocessors often execute concurrent multiple processes by repeatedly switching between the processes, and synchronization between concurrent processes accessing shared data can be handled by the software, usually involving calls to the operating system.
Multiprocessor computer systems have more than one processor operating at the same time. Multiprocessors are increasingly being used to overcome the speed limitations of single processors executing instructions one at a time. The processors share one or more main memories. A bus interconnects the processors and the shared memories. The bus can usually perform only one data transfer at a time, but interconnections capable of performing more than one data transfer at a time are known. Even if the interconnection of the processors and shared memories may perform more than one data transfer at a time, accessing memory imposes delays and each shared memory can only process one processor request at a time, imposing additional delays when concurrent memory accesses are attempted.
The delays may be partly averted by providing the processors with local memories that can be accessed by only one processor and whose access does not involve the bus. Any data that is accessed by only one processor may be held in that processor's local memory instead of the shared memory. Accesses to this data do not slow down other accesses that are made to the shared memory.
Additional reductions in shared memory access delays may be accomplished by providing each processor with a cache and a cache controller. Each cache is used to hold cache lines (copies of sets of contiguous shared memory locations) containing shared memory data that was recently accessed by its associated processor. The next time the processor attempts to access data, a copy of which is in the cache, the access is made to the cache instead of the main memory, allowing other accesses to the shared memory by means of the bus. The cache needs to store both the data and its memory location.
As long as the memory data is only being read, or is accessed by only one processor, caching does not present any problems. A problem does arise, however, when a processor writes to a memory location that is also accessed by other processors. When this happens, other cache copies of the same data must be updated or invalidated. The shared memory copy of the data must also be updated, or marked as invalid and (if required by the coherence protocols) provided with a reference to the cache holding the up-to-date version of the data.
This problem, known as the cache-coherence problem, may be dealt with using cache-coherence protocols. For example, the number of processors able to read cached data may be left unlimited (they are said to have a "shared" cache state), but only one processor may be allowed to write data to the cache at any given time. The processor which may write data to the cache is said to "own" the cache. When a processor writes the data, it must acquire "exclusive" access to the cache. Converting to the exclusive state requires the invalidation of all other caches providing read access to the data, since they become stale (outdated).
In some coherence protocols, initially shared cache copies must be converted to owned before they may be written to and converted to the exclusive state. This conversion may involve changing the shared line to invalid and fetching the line in an owned state. Once a line is owned, it can be written, but once written, copies in other caches must be invalidated.
Caches allow data to be efficiently accessed by multiprocessors. However, to correctly utilize this sharing capability, multiprocessors require synchronization primitives (synchronization operations that are not implemented using simpler synchronization operations) to control access to shared data. For example, consider two processes concurrently updating a bank account, as illustrated in C language, in which.fwdarw.is used to indicate a member of a structure referenced with a pointer (variable whose value is a memory address), in Table 1.
Table 1: Inconsistent concurrent accesses
Process1: PA1 Process2: PA1 Process1: PA1 Process2:
previous1=account1.fwdarw.balance; PA2 previous2=account2.fwdarw.balance; PA2 account1.fwdarw.balance=previous1+transfer; PA2 account2.fwdarw.balance=previous2-transfer; PA2 temp1=account1.fwdarw.balance; PA2 temp2=account2.fwdarw.balance; PA2 combinedBalance=temp1+temp2; PA2 while (SetLock(&semaphore) !=UNLOCKED); PA2 previous1=account1.fwdarw.balance; PA2 previous2=account2.fwdarw.balance; PA2 account1.fwdarw.balance=previous1+transfer; PA2 account2.fwdarw.balance=previous2-transfer; PA2 semaphore=UNLOCKED; PA2 while (SetLock(&semaphore) !=UNLOCKED); PA2 temp1=account1.fwdarw.balance; PA2 temp2 =account2.fwdarw.balance; PA2 combinedBalance =temp1+temp2; PA2 semaphore=UNLOCKED;
If the second process reads one balance before it is updated and the other balance after it is updated, combinedBalance will be assigned an incorrect value. For example, if temp1 is read before account1.fwdarw.balance is updated and temp2 is read after account2.fwdarw.balance is updated, combinedBalance is assigned the incorrect value of (previous1+previous2-transfer) instead of the correct value of (previous1+previous2).
Incremental upgrades of instruction sets for synchronization within multiprocessor systems have typically included a single multiprocessor synchronization primitive. The test&set instruction, supported by Motorola, Inc., Schaumburg, Ill. on its MC68040 microprocessor instruction set, tests a memory value and sets it to a predetermined value; the load and clear instruction, supported by Hewlett Packard, Palo Alto, Calif. on its Precision Architecture RISC (PA-RISC) instruction set, clears a memory value and returns its previous value. Both have no arguments, but return a value. The compare&swap instruction, supported by SPARC International, Inc., on its 64-bit SPARC architecture instruction set, compares a memory location with a first argument, and if they are equal, swaps the content of the same memory location with the contents of a second argument. Also known are fetch&add, which returns the value of a memory location and updates it in memory by adding to it an argument, and mask&swap which unconditionally swaps a selected set of bits of a memory location (as specified by the first argument) with the corresponding bits of a second argument.
Mask&swap, fetch&add, and compare&swap form a useful basic set; the capabilities of one of these instructions cannot be easily emulated by the others. These basic operations can be performed on uncached as well as coherently-cached data.
Any of the above instructions can be used to implement a data structure called a "semaphore" that can be used to insure the integrity of shared data structures, as illustrated in Table 2. In an "indivisible" (or atomic) operation (operation during which no modification of the accessed data or reading of the data being updated may be done by a concurrently executing process), the SetLock() call sets a semaphore to a locked value and returns its previous (unmodified) value. The first process to execute this instruction sets the semaphore to a locked value; the second process is blocked until the first process restores an unlocked semaphore value.
Table 2: Serializing conflicting accesses
Explicit semaphore locking is sufficient for many applications. However, semaphores have limitations when used to lock more general database structures. Semaphores require explicit user program support. When data dependencies are poorly understood, global locks are typically used; these inhibit concurrent multiprocessor accesses. The failure or rescheduling of one process creates partially updated data structures. Thus, it can be seen that semaphore locking is unsafe.
A popular set of synchronization primitive instructions, supported by MIPS Technologies, Inc., Mountain View, Calif. on its MIPS RISC Instruction Set Architecture, by Motorola, Inc., Schaumburg, Ill. on its PowerPC 601 microprocessor, and Digital Equipment Corporation, Maynard, Mass. on its Alpha architecture, is safer when single cache lines are being updated. An initial LoadReserved instruction loads the value of a shared variable and places a reservation on its cache line, signifying an intention to later modify if the reservation is not lost. Intermediate instructions (such as add) compute a new data value. A final StoreConditional instruction saves the new data value in memory.
The operation of the StoreConditional instruction depends on the cache line's reservation state. If the reservation is still set, the StoreConditional updates the memory (with the new data value) and a successful status code is returned. Otherwise, the reservation has been lost, the memory update is nullified and an unsuccessful status code is returned. In typical uses, the StoreConditional status is checked and, if an unsuccessful status code has been returned, the initial LoadReserved and intermediate instructions are repeated until a successful status is returned.
Reservations are lost when data is written into the reserved address by another processor, when the cache-line is deleted from the cache, or when the executing process is context switched (there is a break in its execution during which another concurrent process is executed by the same processor).
By placing the appropriate computation instructions between the LoadReserved and StoreConditional instructions, cache-line reservations can be used to serialize accesses to a single variable. For example, code for a Process3 accessing only one variable is illustrated in Table 3. When access to more than one variable must be serialized, a semaphore must be implemented and locking must be used as described above.
On line 3 of Table 3, the LoadReserved instruction reads the value of totalAddress, writes the value into previous, and places a reservation on the cache line of totalAddress. An updated value of totalAddress, sum, is calculated on lines 4-7. On line 8, the value of sum is written into totalAddress if the reservation of the cache line of totalAddress has not been lost as a result of a write by another process, deletion from the cache, or context switch of the executing process. This is done by the StoreConditional instruction, which also sets a variable lost to a status which is LOST if and only if the reservation has been lost.
TABLE 3 ______________________________________ Process3, using cache reservations ______________________________________ totalAddress = &(account1-&gt;balance); do { previous = LoadReserved(totalAddress); if (previous &lt; minimum) deduction = debit + serviceCharge; else deduction = debit; sum = previous - deduction; lost = StoreConditional(totalAddress, sum); } while (lost != LOST); ______________________________________
Cache-line reservations are attractive because a wide variety of synchronization operations can be created by placing the appropriate computation instructions between the LoadReserved and StoreConditional instructions. However, this may be done only when a single cache line is to be accessed. Also, forward progress (eventual successful execution and conclusion of the shared memory access section) is guaranteed only if caches transition between the shared and exclusive states. As mentioned in the earlier discussion of cache coherence, some protocols require an intermediate invalid state between the shared and the exclusive states, which is incompatible with the forward progress requirement. Another disadvantage is the loss of reservations during a process context switch.
Accordingly, an object of the present invention is to provide a processor and method for process synchronization without locking of arbitrary duration of the entire shared data structure.
Another object of the present invention is to provide a system and method for process synchronization permitting a data structure update started by one process to be finished by another process.
A further object of the present invention is to provide a system and method for process synchronization permitting the calculation and update of a data structure to continue after a context switch.
Yet another object of the present invention is to provide a system and method for process synchronization compatible with a wide range of cache-coherence protocols.
Additional objects and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objects and advantages of the invention may be realized and obtained by means of the instrumentalities and combinations particularly pointed out in the claims.