1. Field of the Invention
The present invention relates to an information processing technology, and more particularly relates to the instruction processing technology of a system with shared memory, such as a symmetrical multi-processor (SMP) in which a plurality of processors are combined, cc-non-uniform memory architecture (cc-NUMA) and the like.
2. Description of the Related Art
In a shared memory system by a multiprocessor, as a method for securing an exclusive right, a lock method, mutex-lock and the like are known. A spin loop is one of the typical methods for obtaining lock. In this case, a “lock variable” is provided for main memory, and each processor repeats the reference/update trial of the “lock variable” and spin loop (no-load running waiting) in order to secure lock. If lock is obtained, the lock is displayed only during lock. If the lock is released, lock release is displayed. Thus, an exclusive right can be secured among a plurality of processors.
An example of a conventional spin loop instruction string is described below. An address block (4 bytes) indicated by the “lock” of the main memory corresponds to the above-mentioned “lock variable”. If the contents are 0, it is indicated that the process is in a lock-release status. If the contents are its own ID (contents of ID to which each process belongs), it is indicated that the process is being locked.
Although in the following example, description is made using a SPARC V9 instruction set, the description is not peculiarly applied to a specific instruction set, but is common to all instruction sets.
As the simplest spin loop, the instruction string shown in FIG. 1 can be used.
In FIG. 1, the function of each instruction is described on its right side as comments. In an instruction string structured as shown in FIG. 1, since in a cas instruction, there is always a possibility that an access to the main memory of another processor (store) may occur, the processor always operates to obtain a cache block exclusive right. In this case, a plurality of processors tries to obtain the same lock variable and competition among a plurality of segments of cache becomes tough, which is a problem.
FIG. 2 shows an example of an instruction string by which this problem is solved. In FIG. 2, the cache is kept in a shared status unless the lock variable is rewritten (as long as a processor that obtained lock maintains the lock). Therefore, there is no above-mentioned cache competition.
In such a configuration, the loop must be always executed and checked. However, since the recent high-speed tendency of processors is faster than the high-speed tendency of memory system, the difference in speed between a processor and a memory system is getting large.
In such a situation, although instruction strings are interpreted and executed by a great number of idle running by a spin loop, substantially no job is made and power is uselessly consumed, which is a problem. More particularly, in a large-scale SMP system, a specific lock variable is often collectively scrambled for. In this case, equipment other than a specific CPU does no useful job, and the power cost of system operation increases, which is a problem.
In a processor core adopting a multi-thread processing method, if this spin loop occurs in a specific thread processing part, idle running due to a spin loop process with no substantial job hinders the progress of other thread processes of the processor core, which is also a problem.
The same problem occurs in other processes using a lock variable, such as barrier synchronization, a processor synchronization process (synchronization waiting), general processor synchronization, I/O synchronization, an idle loop and the like.
As conventional exclusive control and synchronous control technologies in a multi-processor system, Patent References 1, 2 and 3 are known.
Specifically, in Patent Reference 1, a mechanism for realizing exclusive control by storing a shared variable in main memory and collectively monitoring processors on the main memory, is disclosed. In a recent processor with cache memory, rewriting in the cache is not promptly reflected on the main memory. More particularly, in a write-back cache method, it usually takes a fairly long time to reflect rewriting. Since in a recent processor with write-through cache, memory latency is very short and reflection loss is long. Accordingly, performance degrades.
Therefore, as in Patent Reference 1, the above-mentioned spin loop problems cannot be solved simply by collectively monitoring a plurality of segments of main memory. Therefore, a method for solving them within cache memory, which does not affect memory latency, is desired.
In Patent Reference 2, a technology for realizing the exclusive control of shared memory among CPUs by providing an access control signal wire (pin) for the exclusive control among CPUs in addition to a system bus shared by a plurality of the CPUs, is disclosed. Recently, since connection between processors (for example, the number of input/output pins of an LSI) has been costly, the use of one pin as a data line is more effective in performance improvement than the exclusive use of one pin for the purpose of exclusive control. Otherwise, the deletion of even one pin can contribute to the reduction of CPU manufacturing cost more. Therefore, a method for realizing exclusive control among CPUs without increasing the number of pins, is demanded.
In Patent Reference 3, a synchronous control circuit used to control synchronization between a processor and a co-processor, which are in the relationship between a master and a servant, is disclosed. However, it is difficult to apply the circuit to a system in which each processor equally handles shared memory.
Specifically, a processor can voluntarily catch the operation status of a co-processor since the processor is in a position to issue instructions to the co-processor. However, since in an SMP system, each processor does not logically store information about the operation statuses of other processors, it is difficult to apply the technology of Patent Reference 3 in order to solve the above-mentioned problems.
The present invention is made to solve such problems. The present invention can also be applied to a co-processor system.    Patent Reference 1: Japanese Patent Laid-open Application No. 3-164964    Patent Reference 2: Japanese Patent Laid-open Application No. 61-229150    Patent Reference 3: Japanese Patent Laid-open Application No. 2002-41489