In a data processing device configured to execute a plurality of threads in parallel, execution of a thread may interrupt at a certain point in time of another thread being executed. Any problems are not caused when the processes executed by these threads do not relate to each other, since the results obtained may not change even if such an interrupt occurs.
However, if a thread being in execution is interrupted by another thread executing another process relating to the process being executed by the interrupted thread, a problem that the obtained results are different from those obtained when the thread is not interrupted may occur. Thus, certain countermeasures need to be taken to that problem.
As an example, a process in which two threads add “1” to the same variable, that is, a process of reading the variable, adding “1” to the variable, and overwriting the variable with the result of the adding may be performed. A problem occurs in a case where a thread process, which has read a variable but has not overwrite the variable with the result of adding 1 yet, is interrupted by another thread process (a process adding 1 to the variable).
If such an interrupt occurs, the process which has been executed first may overwrite the variable with the value obtained by adding “1” to the original value without detecting updating of the variable by an interrupting second process.
When an interrupt of a thread process does not occur, each of two threads performs an operation of adding “1” to a variable, and as a result, the variable is increased by “2”. Since each thread performs an operation of adding “1” to the variable, and two threads perform the process, the correct processing results have a value in which “2” is added to the original variable value.
However, if a process progresses in an order such that a thread process interrupts during another thread process as described above, the variable is increased by “1” although two threads have performed the operation of adding “1” to the variable. Thus, correct results are not obtained.
As described above, a processing section (in the above example, a section after reading data and before overwriting processed results) in which a problem occurs if a process interrupts during execution of another process is referred to as an exclusive control section or the like, and in this section, control for preventing the interruption of another thread process is performed explicitly. In this specification, this section will be referred to as an exclusive control section.
If there is just one processor unit that is responsible for executing a program, by inhibiting switching to another process at a point in time when a process enters an exclusive control section and allowing switching to another process at a point in time when the process exits the exclusive control section, it is possible to guarantee that another does not interrupt in that section.
When there is just one processor unit, the reason that execution of a program (thread) interrupts an execution of another program executed as a thread is because an operating system performs switching of threads due to the occurrence of a certain event that triggers switching of threads.
Thus, by instructing the operating system to inhibit switching to another process (thread), even if a certain event that triggers switching of threads occurs in a state where switching of threads is inhibited, it is possible to control so that switching of threads is not performed at that point in time but switching of threads is performed at a point in time when a first program allows switching to another process (thread).
In contrast, in a multi-processor system, it is not possible to guarantee that correct processing results are obtained just by inhibiting switching to another process. This is because inhibiting of switching to another process is effective to a processor unit that is executing the program, but the inhibition does not affect an execution of a program on another processor unit.
As a method of preventing an execution of a program on another processor unit from entering an exclusive control section, a countermeasure to provide a flag (hereinafter referred to as a lock word) that indicates whether there is a thread that is being executed in the exclusive control section is generally adopted.
A thread checks a lock word at a point in time when the thread enters an exclusive control section, and 1) when the lock word is a value indicating a non-use state (hereinafter referred to as “unlocked”), the thread changes the lock word to a value indicating a use state (hereinafter referred to as “locked”) and executes processes in the exclusive control section, 2) when the lock word is locked, the thread waits until the lock word becomes unlocked, changes the lock word to “locked,” and then executes processes in the exclusive control section.
Moreover, the lock word is re-changed to “unlocked” at a point in time when execution in the exclusive control section ends. By performing the above control, it is possible to eliminate the occurrence of a problem in which a process executed by another processor unit and a process executed by a subject processor unit race with each other in the exclusive control section. The exclusive control section is associated with the correctness of processes performed by a plurality of threads and may become a bottleneck factor that determines the upper limit of the performance of a data processing device.
This is because in a data processing device having a multi-processor configuration, if a certain thread executes processes in an exclusive control section (hereinafter, referred to “uses” in order to treat like other resources), another thread that needs to use the exclusive control section may need to perform an operation of waiting until the thread using the section exits the section.
This means that a wait queue is formed for the exclusive control section similarly to physical resources such as a processor unit or a disk. Thus, when the utilization of the exclusive control section approaches 100% earlier than the other resources with an increase of a load, the exclusive control section becomes a bottleneck that determines the upper-limit of a system performance.
The utilization of the exclusive control section is the product of the number of use times per unit time and one use period. Thus, the above two factors are in an inverse proportional relation in a situation where a throughput performance of the processing of the data processing device is saturated, and the exclusive control section becomes a bottleneck, that is the utilization thereof is 100%.
This is because it is considered that if the exclusive control section becomes a bottleneck, the number of use times per unit time corresponds to a throughput performance of the data processing device. In order to increase the upper limit of the throughput performance of the data processing device in such a situation, it is necessary to shorten one use period of the exclusive control section.
One use period of the exclusive control section is a program execution period from the entering into the exclusive control section to the exiting out of the exclusive control section, and is the product of three factors (1) to (3): (1) the number of instructions executed during that period; (2) the number of clocks per instruction (CPI); and (3) one clock cycle period.
Among these, it is not easy to decrease the two factors (1) and (3), and in many cases, these factors are treated as fixed values. This is because the factor (1) is a factor that is determined by the content of a process that is protected and performed in the exclusive control section, that is an algorithm implemented with a program, and the factor (3) is a factor that is determined by hardware of the data processing device.
On the other hand, the factor (2) is associated with various factors such as an instruction execution architecture of the processor unit or an architecture of a cache memory, and there is a plenty of room for tuning.
A related art concerning implementation of the exclusive control section will be described below. An important thing is that two operations, checking (reading) the value of a lock word when a thread enters an exclusive control section and changing (writing) the lock word to “locked” when the value thereof was “unlocked”, are to be treated similarly to the exclusive control section.
Due to this, a processor unit having a multi-processor function is provided with an instruction for performing this operation. For example, the Intel (registered trademark) x86 processor unit is provided with a cmpxchg instruction (see Intel (registered trademark) 64 and IA-32 Architectures Software Developer's Manual Volume 2A: Instruction Set Reference, A-M, http://www.intel.com/Assets/PDF/manual/253666.pdf).
This cmpxchg instruction is an instruction that uses three operands: a register (eax register) reserved by the instruction, a register operand, and a memory operand. This instruction atomically performs a series of operations: (1) an operation of reading the value of a memory operand into a processor unit; (2-1) an operation of writing the value of the register operand to external memory when the read value equals to the value of the eax register; and (2-2) an operation of writing the read value to the eax register when the read value does not equal to the value of the eax register.
Atomic as used herein means that a hardware operation guarantees that another processor unit does not access external memory during the memory read operation of (1) and the memory write operation of (2-1). An operation that the cmpxchg instruction performs is often called “Compare And Swap (CAS instruction).”
When a lock operation is performed using the CAS instruction, the CAS instruction is executed using the memory operand as a lock word, filling “unlocked” into the eax register, and filling “locked” into the register operand.
Since the operation (2-1) is executed when the lock word is “unlocked,” the lock word is updated to “locked,” and the value of the eax register is not changed. On the other hand, since the operation (2-2) is executed when the lock word is “locked,” writing to the lock word is not performed, and “locked” is set to the eax register.
A thread that has executed a CAS instruction may check whether the lock operation has succeeded or failed by checking the value of the eax register after executing the CAS instruction. That is, the thread may determine whether it will execute the exclusive control section or will enter a state of waiting until “unlocked” is set to the lock word.
Presently, various examples of the data processing device described above have been proposed (see Patent Documents 1 to 4).