Microprocessor technology is evolving from a single-core era into a multi-core era. The multi-core processor has become main-stream and is still evolving quickly. In the multi-core processor, each CPU core can support a plurality of threads. With the increase of the demand for computational power requirements of the multi-core processor, parallel programming which can effectively explore the hardware parallelism is the most logical way to meet the demand. In parallel computing, computational speed is increased by programming a plurality of CPU cores (processing units) in the multi-core processor to solve a single problem cooperatively. In order to take full advantage of the multi-core processor, a parallel program, i.e. an application program including parallel processing, is generally used. In the parallel program, processing of a task is broken up into a plurality of parts, i.e. threads. These threads can be executed concurrently and communicate with each other to work cooperatively and correctly by accessing some shared data structures and applying proper synchronization methods.
When the parallel program is executed on the multi-core processor, the multiple threads in the parallel program can access a shared data structure to perform operations on the shared data structure, such as removing or adding an element. When the multiple threads access the shared data structure, a synchronization mechanism should be used to ensure that only one thread can operate on the shared data structure at a given time. This can be achieved by granting a lock of the shared data structure to the thread. If one thread acquires the lock, other threads cannot acquire the lock. The thread acquiring the lock can operate on the shared data structure and release the lock after completing the operation, and then another thread can acquire the lock to operate.
In the parallel program, an array-based data structure is widely applied, wherein an array is used to store the data.
Next, an existing solution of accessing a shared data structure by multiple threads is illustrated by way of an example. FIG. 1 shows a process in which three threads access an array-based list in parallel. As shown in FIG. 1(a), the elements in positions 0-3 of the list are “A”, “B”, “C” and “D.” The operation of the thread 1 is “add(0, ‘Y’)”, i.e. to add the element “Y” at the position 0. The operation of the thread 2 is “add(1, ‘X’)”, i.e. to add the element “X” at the position 1. The operation of the thread 3 is “remove(1)”, i.e. to remove the element at the position 1. First, the thread 1 acquires the lock of the list and adds the element “Y” at the position 0 of the list. Accordingly, the elements “A”, “B”, “C” and “D” originally at the positions 0-3 are shifted to the position 1-4, as shown in FIG. 1(b). Then, the thread 1 releases the lock and the thread 2 acquires the lock. The thread 2 adds the element “X” at the position 1 of the list. Accordingly, elements “A”, “B”, “C” and “D” originally at the positions 1-4 are shifted to the position 2-5, as shown in FIG. 1(c). Finally, the thread 2 releases the lock and the thread 3 acquires the lock. The thread 3 removes the element “X” at the position 1 of the list, and the elements “A”, “B”, “C” and “D” originally at the positions 2-5 are shifted to the positions 1-4, as shown in FIG. 1(d). It can be seen from the above process that each time that a thread operates on the list, all the elements in the list would be shifted. In a real application program, when the multiple threads modify the data structure frequently, the above described element shift would occur many times. This element shift overhead could downgrade the performance of the whole multi-core processor.