Conventional computer systems and computerized devices include one or more central processing units (CPUs) or processors that can operate (e.g., execute) software programs that are encoded as a series of logic instructions within a memory system accessible to the processor(s). Such computer systems also typically include an operating system program encoded within the memory system. The operating system operates as a control program that controls or schedules when the processor(s) is/are able to execute the entire collection of programs that are waiting to operate, such as user processes, operating systems processes and the like. Multitasking operating systems allow a single processor within a conventional computer system to execute multiple processes or threads in a back-to-back or time-sliced manner such that each process is able to move forward and make progress in its execution by utilizing a portion or “slice” of processor cycles for execution. The terms process and thread will be used throughout this description interchangeably to denote a related set of logic instructions in a program or process that a processor can perform (e.g., execute, interpret, run, etc.).
Some conventional computer systems include multiple processors that can operate under the control of a multiprocessing operating system. Such a multiprocessing operating system controls the execution of multiple processes across the range of available processors in the computerized device. Most common multiprocessor computer systems operate as “symmetric” multiprocessors (SMP) where all memory in the computer system is shared and any processor may have access to any portion of memory. In other words, all processors have a symmetric “view” of physical memory. As an example of a multiprocessing computer system in operation, an operating system may begin to execute a user process on a first processor for a period of time until an interrupt of some sort occurs to that user process. Perhaps the interrupt is caused when the processor executes an instruction in the user process that requires that user process to access to data stored within a disk drive or other storage device coupled to the computer system. As a result of such an input/output (I/O) request, the operating system in that computer system suspends execution of the user process on the first processor while other software (e.g., an I/O process) and/or circuitry within the computer system handles any required processing associated with the I/O interrupt. When the operating system later detects that handling of the interrupt is complete or finished and the requested data is now available for the user process, the operating system then reschedules execution of the user process on the same processor, or possibly on a second, third or other processor since the first processor may have already been rescheduled and may be currently executing another process. In this manner, multiprocessing operating system can “migrate” execution of processes from one processor to another to achieve greater overall processing throughput and while one process is waiting for completion of an interrupt (e.g., to obtain data from a disk), the operating system can de-schedule this process (i.e., block the process in a wait state until completion of the interrupt) and can schedule another process to operate on the processor in place of the blocked process so as to optimally utilize processing cycles of that processor.
Certain software programs that execute as processes within conventional computer systems sometimes include a requirement that portions of software code within the process be executed in an “atomic” or uninterrupted manner. These portions of code in such a process or program are often referred to as “critical code,” “critical code sections” or “atomic” code. Generally, critical or atomic code is a series of one or more software or other logic instructions associated with a process, thread or program, such as microcode, machine language instructions, or even high-level language instructions (e.g., a series of C or Java code statements), that a processor in the computer system must ensure to execute from start to finish without any interference from interruptions. Typical sources of interference are interruptions and actions performed by other processes such as remote actions. A common example of interference would be multiple threads writing to shared memory variables. Interference may also occur when a thread issues an instruction that generates an interrupt that the operating system must handle, such as by issuing a system call to obtain data form a storage device in the computer system. To handle this type of call, the operating system must access the storage device, obtaining the request data, and returning the data back to the process that made this system call. In relative terms, such a system call might take a large amount of processing time since the storage device is comparatively slow to access data in relation to the number of instructions that the processor may perform in the same amount of time. Accordingly, during such the time the storage device is obtaining the requested data, the operating system may cause the processor to operate (e.g., execute) another process. This other process that executes in the meantime (i.e., while the process that issued the system call causing the interrupt waits for completion of the access to the requested data in the storage device) may modify data associated or shared with the other process, thus causing interference.
A common example of interference would be multiple threads writing to shared memory variables. Another common type of interference is a “clock” interrupt used by an operating system's (e.g., kernel's) scheduler to implement preemptive multitasking. When the scheduler activates a thread, the scheduler programs a hardware clock in the processor to expire at the end of that thread quantum (i.e., a time period assigned to that thread for execution). When the quantum expires, the clock generates a hardware interrupt causing the scheduler to gain control. The scheduler then switches to another thread for execution. Such interference or preemption is sometimes referred to as an involuntary context switch. During operation of other thread during their respective quantums, they may modify memory locations of previously execution thread thus causing preemptive interference. Another source of interference is thread migration, where kernel executing a thread on one CPU migrates (e.g., for load balancing purposes) the thread to execute on another CPU. Interference is thus generally defined as an external modification or change made (i.e., by code other than the critical code or the process containing the critical code) to data, memory contents, register contents, flags, or other information that is related to (e.g., referenced by) the critical code.
There are a number of reasons why a process may contain a series of instructions (i.e., critical code) that must be executed atomically (i.e., without interference). As an example, some conventional computer systems include memory systems that operate as shared memory. Shared memory may be, for example, a section of main memory that allows two or more software processes to access the same set of memory locations during their execution. Processes can use shared memory for such functions as interprocess communication, process synchronization and for other reasons. When a process contains a series of instructions that operate on shared memory locations, it is often preferable to execute those instructions atomically as critical code in order to ensure that the content of the shared memory is accurately maintained (i.e., to ensure that no other process or program could have manipulated the shared memory accessed by the critical code during its atomic operation). If a conventional operating system interrupts a sequence of critical code instructions that access the shared memory before the critical code sequence completes full execution (i.e., before the sequence completes execution from start to end), the state or contents of the shared memory might be unreliable upon return to execution of the critical code at the point of interruption since other processes or code that may have executed during the interruption may have caused interference to the shared memory. This is one example of interference caused by an interruption.
Software and computer system developers have created a number of conventional techniques to allow a sequence of critical code instructions in a process to execute in an atomic manner to ensure that interference caused by interruptions is avoided. One such conventional technique is an atomic instruction used within software code called a “compare and swap” (CAS) instruction. Generally, a CAS instruction provides a technique for depositing a value into a memory location while guaranteeing that processing leading up to the CAS instruction is not interrupted.
In operation, prior to execution of the CAS instruction, a processor executes a load instruction to fetch a value from a known memory location M. This memory location M is typically the target memory location to which data must be written to in an atomic manner (i.e., without interference). Then, a processor executes one or more critical code instructions in the process or thread to perform any required critical code processing. Finally, the processor executes the CAS instruction typically as the last instruction at the end of the critical section of code. The CAS instruction receives a set of parameters including an old value, a new value, and an address of the memory location M. The CAS instruction obtains ownership of the shared memory or cache at the location M specified by the address parameter and then obtains the value of data stored at this location. The CAS instruction then compares the value obtained from location M with the old value parameter provided to the CAS instruction. If the old value (i.e., the parameter) equals the value obtained from the location of the address M (i.e., the value fetched at the beginning of the critical code section), then the CAS instruction can assume that no interference has taken place to this memory location and the CAS instruction proceeds to store the new value at that location M. The CAS instruction also returns the new value as output. In the alternative, if the old value parameter does not equal the value that the CAS instruction retrieves from the location of the address M, then the CAS instruction can infer that some processing has disturbed or caused interference to the original value at the memory location M. In such cases, the CAS instruction does not write to memory, but does return the value fetched from location M. Upon such an indication, the processor can re-execute the critical code by jumping to the start of the critical code (i.e., by jumping back to the initial store instruction) to make another attempt to execute the critical code from start to end without interference.
A typical conventional process uses the CAS instruction at the end of critical code to form a loop that continually attempts to successfully execute the critical code ending with the CAS instruction each time this instruction fails. In this manner, a process operating the CAS instruction will not continue execution beyond the critical code section until the CAS instruction is successfully completed one time, thus guaranteeing that the thread has completely performed all of the critical code and the new value is placed into the memory location specified by the address parameter without interference from any interruptions that may have occurred during execution of all critical code preceding the CAS instruction (beginning with the original or old value being loaded from the memory location that the CAS instruction eventually checks).
An example of the CAS instruction is shown in the following code segment:
RETRY:LD M → TMP;. . . TMP+1 → TMP2;(interruption causing interferencemight occur here)CAS M,TMP,TMP2;IF TMP != TMP2 GOTO RETRY;As shown in the example CAS above, a processor executes the LD instruction to load the contents of memory location M into the TMP variable. Next, a sequence of one or more instructions (shown by the “ . . . ”) are executed to manipulate the fetched copy or version of the data. In this example the variable TMP2 is set to the value of TMP+1. During this processing, an interruption causing interference might occur thus causing a change to the memory location M by some process other than the instruction TMP+1→TMP2. After processing all instructions that require atomic execution has been completed, the CAS instruction stores the contents of TMP2 into memory location M if and only if TMP and M are the same. After the CAS instruction, a test is done to check to determine if TMP and TMP2 are the same. If they are, the CAS instruction executed successfully and atomically. If not, then this processing repeats until the CAS instruction is successfully completed. The CAS instruction might fail, for instance, if another processor operates a process which accesses data at the memory location M thus causing interference, or if an interrupt occurred between the LD and the CAS, and another thread executed on the processor in the interim, that thread may have modified location M, rendering the values in TMP and TMP2 registers “stale” (i.e., out of date with respect to memory).
Another conventional technique that provides for atomic execution of critical code sections is called a “load linked store conditional” or LL/SC technique. Generally, the load linked store conditional technique involves the use of two processor instructions: a load linked (LL) instruction followed by a store conditional (SC) instruction. The two instructions operate much like conventional load and store instructions except that the LL instruction, in addition to doing a simple load, has a side effect of setting a user transparent bit called a load link bit. The load link bit forms a “breakable link” between the LL instruction and a subsequently executed SC instruction. The SC instruction performs a simple store to memory if and only if the load link bit is set when the SC instruction is executed. If the load link bit is not set, then the store will fail to execute. The success or failure of the SC instruction is indicated in a register after the execution of the SC instruction. For example, the processor may load such a register with a “1” in case of a successful store or may load the register with a “0” if the store was unsuccessful due to the load link bit being reset. The load link bit may be reset by hardware or software (i.e., changed from the state induced from the original LL instruction) upon occurrence of events that have the potential to modify the memory location from which the LL originally loaded data, and that occur during execution of the sequence of code between the LL instruction and the SC. In other words, a section of critical code that must be executed atomically can be inserted between the LL and SC instructions and the SC instruction will only store data to a specified memory location (i.e., the data being modified by the atomic code instructions) if the load link bit is not reset.
An example of where a link can be broken between an LL and SC instruction on a multiprocessor system is when an “invalidate” occurs to a cache line of shared memory which is the subject of the LL. In other words, the link might be broken between the LL and the SC instructions if the processor that executes the LL observes an external update to the cache line, or if an intervention of snoop operation invalidates the line associated with the bit. The link may also be broken by the completion of a return from an exception (i.e., interrupt). It may be the case, for example, that an interrupt to the critical code occurs after execution of the LL instruction but before the SC instruction. During the interrupt, some other thread may have successfully completed a store operation to that same shared data which causes the load link bit to be reset. To avoid interference, the software or hardware will explicitly break the link when returning from the operating system back into the interrupted critical code. This will result in the subsequent SC failing.
Typically, on SMP systems, the kernel of the operating system saves the LL address in a hidden register. That address is “snooped” by the cache coherence subsystem in normal operation. A CPU can detect external modifications to the LL address by monitoring (snooping) bus transactions and checking those addresses against the contents of the LL address register. The cache coherence protocol normally snoops in this manner to maintain coherency, so snooping the LL address is effectively “free” (it imposes no additional burden beyond normal coherence snooping). Each CPU has a private LL address register. If a CPU observes an external write to the address contained in its LL address register it “breaks the link” so the subsequent SC instruction will fail.
An example of pseudocode that illustrates the use of the load linked store conditional technique is as follows (with the text in parenthesis indicating the nature of the processing performed):
RETRY:LL M → TMP;(load link bit set)TMP+1 → TMP2;(interruption causing interference andresetting the load link bitmight occur here)SC TMP2, M;(only store if load link bit still set)IF FAILED_BIT = 1 GOTO RETRY;As shown in the example above, the processor executes the LL instruction that operates to load the contents of memory location M into the TMP variable. The LL instruction further sets the load link bit. Next, a sequence of one or more instructions are executed to manipulate data. In this example the variable TMP2 is set to the value of TMP+1. During this processing, an interruption causing interference might occur that causes the load link bit to be reset (i.e., during the interruption). After processing all instructions that require atomic execution is complete, the SC instruction stores the contents of TMP2 into memory location M if and only if load link bit set by the LL instruction is still set (i.e., is not reset). After the SC instruction, a test is done to check a failure bit (FAILED_BIT) in a processor status register associated with the processor executing this critical code to determine if the SC instruction executed successfully. If the FAILED_BIT equals 1, processing returns to the RETRY location in order to again attempt to execute this section of critical code. This processing repeats until the SC is successfully completed. The SC instruction might fail, for instance, if another processor operates a process which accesses data at the memory location M, thus causing the load link bit to be reset (i.e., thus causing interference).
It is noted that the CAS and LL/SC mechanisms are optimistic in that they are written so that they assume the critical code transactions will complete. Such mechanisms thus check for interference at a commit point towards or at the end of the critical code.
Another conventional technique used to ensure atomic execution of critical code instructions is referred to as a lock/unlock mutual exclusion technique. The lock/unlock technique can be used, for example, in situations where a portion of shared user level code must be executed atomically. When a user level process enters a section of shared critical code, the first instruction that is executed is a lock directive that attempts to gain ownership of and set a flag indicating a user level process is in the process of executing this section of critical code. When the user level process succeeds in owning and setting this flag, the process can then execute the remainder of the critical code with or without interruption(s). When the process has completed execution of the critical code instructions, the final instruction the process executes to complete the critical code is an unlock instruction which clears the lock flag thus allowing another user level process to gain ownership of the lock flag and to execute this section of shared critical code. No process is allowed to execute this section of shared critical code until it owns the lock flag. Only one thread may hold the lock and proceed into the critical section at any one time. The kernel excludes or “blocks” other threads from the critical code. If a user level process is interrupted during execution of a critical section of code, that user level process continues to “own” the lock on that section of code and other user level processes (as well as the interrupted process) are blocked from executing that section of code until the interrupt has been handled and processing returns to complete execution of the shared critical code by the user level process that owns the lock on the critical code. That process then completes execution of that section of critical code after the interrupt and performs the unlock instruction to free that critical section of code for ownership and execution by another user level process. Since no other processes could execute the critical code section during the interrupt, it is assumed that interference did not occur.