Java software running on servers, or even upper end work stations, must be designed to permit execution by a large number of CPUs (Central Processing Units). Java is commonly executed in threads. A thread is a single sequential flow of control that runs within a program. A thread is also called an execution context or a lightweight process. A plurality of threads may run at the same time. Threads will share resources such as global data, memory, critical sections of code, and other resources. Shared resources have associated “locks.” A thread must acquire the lock on a resource in order to access the resource.
A key bottle neck that limits performance is the implementation of “locking” or synchronization of access by multiple threads to the same shared resources. In Java programs, a popular access control is the “monitor” structure. The underlying Java virtual machine (JVM), which is embodied in software, provides the runtime environment for the Java program and is responsible for implementing the required locking mechanism. Depending on the implementation approach taken by the JVM and the hardware support for synchronization primitives in the platform, there can be a wide variation of performance on enterprise e-business Java software running on a multiprocessor server.
A common hardware technique used for synchronization and implemented in most processors is an atomic read-modify-write bus cycle, caused by the execution of an instruction such as “XCHG”. In an environment in which contention for locks (hence the resources protected by the locks) is heavy, multiple CPUs can execute a locked read-modify-write operation simultaneously in an attempt to secure ownership of the same lock or set of locks. This is referred to as the “thundering herd” problem, and it leads to heavy system bus contention. Consequently, multiprocessor scalability is limited. As a result, severe performance penalties are incurred.
Sample code which illustrates this problem is in table 1 below.
TABLE 11// available. If it is 1, another process is in the critical section.2//3spin_lock4mov   ar.ccv=0//cmpxchg looks for avail(0)5mov   r2 = 1//cmpxchg sets to held(1)6spin:71d8   41 [ = lock] ;;//get lock in shared state8cmp.ne p1,p0 = r1, r2//is lock held (ie. lock ==91}?10(p1)br.cond.spnt spin ;;//yes, continue spinning1112cmpxchg8.acqrl = [lock], r2 ;;//attempt to grab lock13cmp.ne p1, p0 = r1, r2// was lock empty?14(p1)br.cond.spnt spin ;;//bummer, continuespinning15cs_begin16// critical section code goes here . . .17cs_end:18st8.rel[lock] = r0 ;;   // release the lockIn line 4, the process cmpxchg instruction looks for an available lock. In line 5, if an available lock is found, the status of the lock is changed from 0 to 1. At line 9, there is an attempt to grab a lock. At line 10, the question is asked as to whether the lock is empty and the answer is provided at line 11, where the attempt to acquire the lock is unsuccessful and the process must continue until it finds the resource unlocked. It is desirable to minimize the overhead associated with lock contention.