1. Field of the Invention
The present invention relates to a method and an apparatus for executing a long transaction in a system with limited transactional hardware resources.
2. Related Art
Synchronization mechanisms facilitate preventing, avoiding, or recovering from inopportune interleavings of concurrent operations which are referred to “races.” One such synchronization mechanism is mutual-exclusion locking, wherein at most one thread is permitted access to protected code or data (e.g., critical sections).
In the Java programming language, the Java Virtual Machine (JVM) provides monitors by which threads running application code can ensure that certain operations are performed atomically with respect to the execution of other threads. (Note that Java is a trademark or registered trademark of Sun Microsystems, Inc. in the United States and other countries.) Typical JVMs implement monitors with mutual-exclusion locking mechanisms, wherein the monitor is either locked or unlocked, and wherein only one thread can own the monitor at any given time. A thread can enter a critical section protected by a monitor only after acquiring ownership of the monitor. If a thread attempts to lock a monitor that is in an unlocked state, the thread gains ownership of the monitor. However, if a thread attempts to acquire ownership of a monitor that has been locked by another thread, the thread is not permitted to enter into a critical section until the owner of the lock releases the lock and the thread gains exclusive ownership of the lock.
Unfortunately, mutual exclusion can cause problems. For example, under mutual exclusion, threads can be vulnerable to deadlock and a thread that is stalled or preempted while executing a critical section can impede other threads trying to acquire the lock, which can cause excessive latency in acquiring the lock, priority inversion, and excessive context switching.
One solution to this problem is to use a “lock-free” synchronization mechanism. A synchronization mechanism is considered lock-free if a system that is executing a group of threads that communicate with each other is guaranteed to make useful forward progress (e.g., completing an atomic update) in a finite number of processor cycles. Lock-freedom does not guarantee that a specific thread makes progress in a finite number of program steps, but rather that at least one of the threads being executed by the system is guaranteed to make progress after a finite number of program steps.
The Java synchronization mechanism is the synchronized( ) construct. A JVM can implement the synchronized( ) construct in a number of ways:                (1) through mutual-exclusion locking (used by many JVM implementations);        (2) through lock-free updates; or        (3) through a hardware transaction (H-Transaction).Using H-transactions involves commuting a synchronized block to use transactional-memory mechanisms. If the H-transaction fails because the requested operation is not feasible (e.g., exceeds the hardware resources available to the processor) the operation reverts to mutual-exclusion locking.        
Processors that support hardware transactional memory typically have limited hardware resources. The number of tracked loads and conditionally-deferred stores is finite in these processors. Some of these processors do not support large transactions that load from or store to a large number of disparate cache lines. Furthermore, some of these processors do not allow SAVE and RESTORE instructions in transactional mode, thereby precluding procedure calls to non-leaf routines while in transactional mode.
Commuting a synchronized block to an H-transaction affords two benefits:                (1) lock-freedom—other threads are not impeded if a thread in the midst of a transaction stalls; and        (2) increased parallelism—non-conflicting operations can proceed in parallel, increasing overall system throughput. Using hardware transactions provides the effect of using optimal or ideal fine-grained locking without the effort and risk associated with explicitly coded fine-grain locks. Pure readers (i.e., synchronized blocks that only read shared variables) are a degenerate case and never conflict with each other.        
For example, the following operations, although synchronizing on the same object and executed by different threads, commuted to H-transactions can proceed in parallel because the operands do not conflict:
TXN1: synchronized(o) { a++; b++; }TXN2: synchronized(o) { c++; d++; }However, if the implementation of synchronized( ) uses mutual-exclusion locking, TXN1 and TXN2 cannot be executed in parallel and are instead is executed in series.
The following example illustrates the process of commuting synchronized( ) blocks to H-Transactions. Consider the following Java code fragment:
synchronized(o) { a++; if (a & 1) b++; }A typical JVM will transform the above synchronized( ) block into code that is equivalent to the following code:
// Lock( ) and Unlock( ) are runtime// infrastructure support routines.Lock(o); a++; if (a & 1) b++; Unlock(o);More accurately, the code has the form:
try {  Lock(o); a++; if (a & 1) b++;} finally { Unlock(o);  }
If transactional memory is available, the JVM can emit the following code:
// Prefer transactional mode over locksint txn = 1; // auto or thread-local variableif (CHKPT( ) == 0) {  if !isLocked(o) goto Enter;  COMMIT( ); // Either COMMIT( ) or ABORT( ) suffices}txn = 0;Lock (o);Enter :// The same code in both transactional and// mutual-exclusion modes.a++; if (a & 1) b++;if (txn) { COMMIT( ); } else { Unlock(o);  }
The above code attempts to use an H-Transaction. If the object is locked at the start of execution of the critical section, mutual-exclusion locking is used. Otherwise, a hardware transaction is used. Note that islocked( ) loads and tests a lockword for an object. If a second thread acquires the lock while a first thread is executing a transaction, the lockword is modified and the first transaction aborts immediately because of the interference by the second thread.
COMMIT, CHKPT and ABORT can be defined as inline or leaf routines. For example:
COMMIT( )  retl  commitCHKPT( )  !! usage is similar to setjmp( )  mov %g0, %o0  chkpt 1f  nop  retl  nop1 : retl  rd %cps, %o0 !! read cps - will be non-zeroABORT( )  ta 0  retl  nop
Note that if hardware transactional memory is not available, mutual-exclusion locking is used (not shown in the above code). In practice, the CHKPT( ) routine returns a specified non-zero value which indicates whether hardware transactional memory is supported by the processor.
Also note that a more sophisticated transaction-triage-failure policy can be used. The code above checks the return value from CHKPT( ). If the transaction fails because of interference or contention, as evidenced by the return code from CHKPT( ), a more sophisticated failure policy can retry the operation using hardware transactions a specified number of times before attempting to acquire a mutual-exclusion lock. Typically, once mutual-exclusion locking is used on an object, the object tends to remain in mutual-exclusion mode until the contention abates.
Furthermore, note that some critical sections are not eligible to be commuted to H-transactional form. For example, a critical section is ineligible if any of the following occurs:                (1) the critical section is too long or is infeasible (e.g., it requires more transactional memory resources than are available in the processor);        (2) the critical section executes I/O instructions or any other operation that can alter the non-transactional state; (Note that this issue can be somewhat mitigated by emitting non-leaf routines that avoid using SAVE and RESTORE, but instead use the gcc “−mflat” calling convention, which are callable from within H-transactions.)        (3) the critical section calls non-leaf routines (typically, this stricture subsumes (2));        (4) the critical section accesses volatile variables;        (5) the critical section uses java.util.concurrent atomic operators (e.g., atomic compare-and-swap (CAS) is not permitted in some hardware transactional memory architectures and executing a CAS within a transaction causes the transaction to abort); and        (6) the critical section itself contains synchronized blocks (only leaf or terminal synchronized blocks are commutable).        
Hence, what is needed is a method and an apparatus for executing a long transaction without the problems described above.