A multi-core CPU, in which two or more independent cores are incorporated into a single package including a single integrated circuit, is also known as a chip-level multiprocessor (CMP). Such a multicore CPU may reduce a power request and an ineffective increase in hardware and decrease maintenance cost.
Parallel computing is a method of computing a large amount of calculations simultaneously. Through this method, a large and complicated problem is divided into small parts so as to be calculated in parallel simultaneously.
In a parallel computing environment, a single task is dividedly performed by multiple CPUs, thus enhancing utilization of CPUs and task efficiency, increasing a processing speed, and reducing power consumption.
The parallel computing environment may be applied to fields of sensor network-based monitoring and reconnoitering, the fields of application programs of smart devices, and the fields of service application based on clouding computing.
In order to provide high performance services to users or address problems within a short time using the parallel computing environment, sequential codes are converted into parallel codes or systems are initially constructed with parallel codes according to recent technologies.
However, parallel programs, such as Open Multi-Processing (OpenMP, a shared memory multiprocessing programming API supporting the C language, C++ language, Fortran language, UNIX and Microsoft Windows platform, etc.), Open Computing Language (OpenCL, open general-purpose parallel computing framework), Pthread, and C++0x, executed in a parallel computing environment may be performed in unintended order due to physical or logical concurrency, causing errors not intended by programmers.
These problems may generate unintended energy bugs in programs, and thus, a technique of effectively detecting errors generated in programs is required to be developed.
A race is one of main causes or errors that occur in a parallel computing environment.
A race condition takes place in shared variables including at least one write event without appropriate synchronization of parallel threads in a multi-programming system or multi-processor system. This may produce a result unintended by a user.
An error in a race is difficult to predict. Namely, a race may not occur with programs that are executed thousands of times or tens of thousands of times, but may occur at a very important point to cause an execution aspect that does not correspond to a user intention.
A typical execution aspect that does not correspond to a user intention is an infinite loop in a program which loops endlessly or an unintended routine that is performed. When an unexpected resultant value is obtained or severe due to a race, an overall system may be unresponsive. For example, there has been reported a case of United States Ship (USS) which was stopped running for a few hours in the heart of the Pacific Ocean due to an overflow inadvertently occurring in a program. Also, a particular application may loop endlessly due to malfunction of a smart device to cause a huge amount of power consumption. Races cause these phenomena, which, thus, need to be detected in any events. Among races, a race which occurs first in temporal order or logical order and not affected by other races is known as an initial race.
FIG. 1 is a view illustrating an example of a general race. Referring to FIG. 1, two threads are performed in parallel and account values may be 50 or 75 according to a program execution aspect.
However, due to occurrence of a race between R (read access event) of a thread 1 (Thread1) and W (write access event) of a thread 2 (Thread2), a value 150 unintended by a programmer may be produced, and thus, a program may be abnormally operated.
Typical race detection techniques include static analysis, post-mortem detection, and on-the-fly detection.
The static analysis is a technique of analyzing a source code of a program and detecting every latent race that may occur. The post-mortem detection is a technique of performing and analyzing a trace file generated as a particular program is executed. The on-the-fly detection is a technique of simultaneously executing and analyzing a program to detect a race.
According to the on-the-fly detection, each access event with respect to a particular shared variable is basically inspected and compared with previous access events retained in access history.
However, in the case of the on-the-fly detection, access to the access history, a shared data structure, causes a severe bottleneck phenomenon, degrading performance. Thus, in the related art access event selection techniques, in order to reduce the bottleneck phenomenon, only access events that are likely to race are allowed to access the access history. In order to supplement the problem of the on-the-fly detection, a scalable monitoring technique for race detection has been proposed.
FIG. 2 is a conceptual view illustrating a scalable monitoring technique for race detection. Referring to FIG. 2, all the access events that occur in program (A) of FIG. 2 sequentially access history, causing a bottleneck phenomenon to degrade race detection performance.
In contrast, access events that are likely to race, among access events occurring in a program (B) of FIG. 2, selectively access the access history through an access filtering process of an access filter, increasing performance of race detection.
FIG. 3 is a view illustrating an example of an up-to-date scalable monitoring scheme technique in the scalable monitoring technology of FIG. 2, in which L1 is a lock variable, R and W are read and write access events with respect to shared variables, respectively, and numbers following the access events are random occurrence order, respectively.
Referring to (A) of FIG. 3, in order to detect a race in a program model with locks, only seven access events, among a total of nine access events, are monitored and allowed to access the access history, a shared data structure, for race detection, whereby a bottleneck phenomenon that occurs in the shared data structure may be reduced and power consumed to detect a race may also be reduced.
A key principle of this technique is selecting at least a pair of read/write access events each time a lock occurs, and in this case, the number of access events monitored in each thread is 2(U+1) where U is the number of unlocks which has occurred in each thread.
Referring to (B) of FIG. 3, only six access events, among a total of nine access points, are monitored for race detection in the program model including locks and synchronization commands (post/wait).
This is similar to the technique illustrated in (A) of FIG. 3, but advantageous in that the number of selected access events is smaller.
However, in the technique of (B) of FIG. 3, since after nine access events access the shared data structure, unnecessary access events are deleted, so resolution performance of a bottleneck phenomenon that occurs in the shared data structure is lessened compared to the technique illustrated in (A) of FIG. 3.
A key principle of the technique of (B) of FIG. 3 is not repeatedly monitoring access events in a lock region having the same lock variable within post/wait, the synchronization commands, and in this case, a region between post( ) and wait( ) of a certain thread is defined as a block.
Thus, the number of access events monitored in each thread is
            ∑              i        =        1            eBlock        ⁢                  ⁢          B      i        ,where B is 2(Li+1) and Li is a lock variable of i block.
Importance of the foregoing related art scalable monitoring technique lies in that, in case of race detection of a parallel program, a race that occurs in a program is detected by minimizing the number of access events, a main cause of a degradation of race detection performance, as small as possible.
According to the examples of (A) and (B) of FIG. 3, only two or three access events R2 and R6, or R2, R4, and R6 are excluded from monitoring targets, but this is merely part of the parallel program, and in actuality, numerous threads and access events may occur during execution of a program, and thus, a considerable number of access events are anticipated not to be monitored in case of race detection.
However, the foregoing related art scalable monitoring technique fails to resolve a bottleneck phenomenon that occurs during race detection.