A transaction is a concept widely used in the computer field. A transaction generally refers to the execution of a plurality of instructions in an atomic-like manner, with no other operations intervening during the execution. For example, if a transaction accesses data at a certain memory address, then the data at the address should not be modified by an operation outside the transaction until the transaction completes.
A transaction can be implemented directly at the hardware level, such as by modifying the processor architecture. The hardware component that supports transactions at the architecture level is called a Transactional Memory (TM) system. Using a TM system can improve software productivity because programmers may not need to use locks when writing a concurrent program.
The following example illustrates what a transaction is from the programmers' viewpoint. FIG. 1 shows a dynamic balanced binary tree. The operations performed on the tree include read, write, deletion and insertion. If a plurality of threads access the tree concurrently, then the programmers usually use a global lock to protect the whole tree. This coarse-grained method is simple, but it enforces accesses to the tree to be serialized. So, it cannot have good performance. Fine-grained locks can solve the problem. For example, a lock can be given to each node in the tree. However, in this way, the program will be hard to write. When inserting or deleting a node, the neighboring nodes sometimes have to be rotated to keep tree balance. For correctness, multiple locks have to be acquired. That brings new problems, such as deadlock. Programmers have to master strong skills of parallel programming if the programs have to be written using fine-grained locks. So the productivity is low.
With the help of TM, the dilemma disappears. Programmers simply mark the boundaries of a transaction in the code through the newly defined transaction_start and transaction_end. Inside the transaction, the code is written as in the traditional way without any consideration of locks. The hardware will guarantee that the transaction is executed just like an atomic operation, without any intervening operations. The following exemplary code shows operations, such as insertion or deletion of a node, performed on the dynamic balanced binary tree using the TM.
transaction_start { a.p = root;while (TRUE) { b.if (x < p->key) {i.p = p->left; c.} else if (x > p->key) {i.p = p->right; d.} else { break; }}do read/write/deletion/insertion here;} transaction_end;
FIG. 2 shows a current common TM system. As shown in the figure, at the architecture level, all the data accessed by a transaction (speculative data) will be stored in a transaction buffer temporarily, instead of being written into the memory. If two transactions access a same address and at least one of them modifies the data at the address, then one of them has to roll back and re-execute, while the other one continues. This situation is called conflict. If there is no conflict, the temporarily stored data are written to the memory at the end of the transaction. This action is called commit.
In the above example, if the tree is large, then the probability that two threads modify a same node is quite low. So, it is likely safe to run multiple transactions in parallel. Thus, although a coarse-grained programming style is used when using the TM system, the performance of program execution compares to that using fine-grained locks.
As mentioned above, in order to implement a TM system, an on-chip buffer for temporary storage is required. However, the hardware buffer can only have a limited size. For example, Power4/Power5 has a 32 KB L1 data cache for each processor core. The temporary buffer is on the critical path, so it can hardly be larger than L1 cache (actually, because of area limitation, it should be much smaller than L1 cache). On the other hand, it is difficult for programmers to figure out precisely how much storage space will be used by a transaction. So a possible situation is that the storage space consumed by a transaction is larger than the hardware buffer size. This situation is called overflow.
In order to guarantee the correctness of a program, the overflow must be handled. Since overflow is a rare event, the method to handle overflow is not speed critical for overall performance, while the hardware complexity for implementation should be kept as minimal as possible.
A solution for handling overflow through hardware is disclosed in Rajwar R, Herlihy M, Lai K. Virtualizing transactional memory, Proceedings of the 32nd Annual International Symposium on Computer Architecture (ISCA), Madison, Wis.: IEEE Computer Society, 2005, 494-505. The solution is provided by Intel. It avoids overflow by storing the speculative data into the memory. However, this method needs to add some new components to support automatic storage, and involves complex modifications to hardware. The IBM Power architecture adopts the RISC architecture, which requires the hardware to be simple. So, the above solution is not suitable for IBM products and all chips that adopt the RISC architecture.
Another method for avoiding overflow by writing speculative data into the memory is disclosed in Moore K E, Bobba J, Moravan M J, Hill M D, Wood D A. Log™: Log-based transactional memory, the 12th International Symposium on High-Performance Computer Architecture(HPCA), 2006, 254-265. However, compared to the disclosed log-based method, the conflict detection in a cache or a hardware buffer is much faster. So the log-based method disclosed in the document is not ideal in conflict detection.
A method for handling the transaction buffer overflow is also disclosed in Colin Blundell, Joe Devietti, E Christopher Lewis and et al., Making the Fast Case Common and the Uncommon Case Simple in Unbounded Transactional Memory, Proceedings of Annual International Symposium on Computer Architecture (ISCA), 2007. However, this method still needs complex hardware modifications, such as modifications to the storage controller.
Obviously, a simple and efficient solution to handle buffer overflow, with minimal modification to existing hardware architecture is necessary.