1. Technical Field
The disclosed technology relates to the field of transactional computer memory.
2. Background Art
To increase performance, modern computers are using SMP (Symmetric Multi-Processor) architectures where multiple processing units access some amount of shared memory. Because each processor can asynchronously access the shared memory, concurrency-control techniques are used to coordinate the processors' access to the shared memory such that the access do not conflict. Each processor has one or more threads-of-execution. Traditionally, the programmer of these threads uses mutual exclusion locks to control each thread's access to the shared memory.
A set of concurrently executing threads on the same or different processors can use pessimistic- or optimistic-concurrency-control to safely moderate access to shared mutable data (for example locations in a shared memory that are shared by threads executing on a single processor as well as those shared by threads executing on different processors). Pessimistic-concurrency-control prevents undesirable or inopportune interleavings of access to the shared memory by means of mutual exclusion (locks) while optimistic-concurrency-control can detect and recover from inopportune interleavings of access to the shared memory by aborting conflicting operations on the shared memory and retrying the aborted operations.
Pessimistic-concurrency-control by mutual exclusion runs counter to the software engineering principle of “abstraction and encapsulation” where the details of mutual exclusion primitives are usually left opaque to the programmer. In addition there are difficult tradeoffs between designing a coarse-grained or fine-grained locking application because deadlock possibilities exist once more than one lock is used (and the programmer must be aware of which locks are required for correct operation by each critical section). In addition with more than one lock mutual exclusion is not “composable” in that primitive lock-based operators can not be combined into larger composite atomic operations without imposing and understanding lock hierarchies. Thus, a programmer must know all the locks that could be accessed in each of the primitive building block operators. However application performance improves with an appropriate number of locks. Thus, pessimistic-concurrency-control by mutual exclusion is difficult and error prone.
Transactional memory (TM) is an optimistic-concurrency-control technology that provides accesses to shared memory in a concurrent computing context that are analogous to database transactions. A transaction in the transactional memory context can be considered as a programmed procedure that performs a logical-unit-of-work that executes a series of memory load operations and memory store operations to the shared memory. These memory load operations and memory store operations logically occur at a single instant in time (that is, intermediate states are not visible to other successful transactions).
Optimistic-concurrency-control can be implemented as a transaction system that provides transactional capabilities to access the shared memory (the transactional memory). Similar to database transactions, the transaction is attempted—but if an access conflict is detected or the set of observed data values loaded from the shared memory becomes inconsistent, the transaction is aborted (without modifying the contents of the primary locations in the shared memory—although transactional metadata that resides in the shared memory may be modified) and re-attempted. Using optimistic-concurrency-control a programmer can identify the critical code sections such that the runtime system can serialize operations on the shared memory while still allowing parallelism. Thus, transactions that do not conflict may execute in parallel, while transactions that do conflict are aborted and transparently retried. Optimistic-concurrency-control mechanisms provide the programmer with a well-understood and less error-prone data access model. Optimistic-concurrency-control mechanisms can have performance that is competitive with fine-grain locking (so long as the application has some degree of disjoint access parallelism to the shared memory).
A transaction that accesses a transactional memory can include a speculative execution phase (where memory load operations from, and memory store operations intended for, the shared memory are tracked), followed by a commit phase that, if successful, exposes the tracked memory store operations to other transactions that access the transactional memory. When the transaction loads a data value from the shared memory it also enters a read-set entry into the its read-set that identifies the location in the shared memory from which the data value was loaded. If a data value that was loaded from the shared memory during the speculative execution phase of the transaction was concurrently modified by some other transaction (executing on the same or different processor) the data values at the locations identified by the transaction's read-set during the speculative execution phase cannot be known to be consistent and so the transaction aborts. Once the transaction aborts it can be retried with the hope that the data values loaded from the shared memory during the subsequent speculative execution phase will not be modified during the retry. The body of a transaction can be thought of as a function that takes the read-set as input (from which additional members of the read-set may be determined) and computes a write-set and possibly some thread-local outputs. If the data values corresponding to the read-set are modified while the transaction is in the midst of its computations then the transaction's results are potentially corrupt and thus the transaction aborts (and may be retried).
A transaction system exposes the tracked memory store operations in the shared memory in accordance with either a “speculative-store-buffer” policy or an “update-in-place” policy. The transaction's write-set generally includes write-set entries that identify which locations in the shared memory are to be (or have been) modified. Under the “speculative-store-buffer” policy, transactional stores (that record the data value corresponding to a location in shared memory) are kept in a speculative-store-buffer (the write-set) pending a successful commit. If the transaction commits successfully those pending stores will have been transferred from the speculative-store-buffer to their ultimate locations in the shared memory. If the transaction aborts, the contents of the speculative store buffer are discarded and the corresponding data values at write locations in the shared memory are not altered.
Under the “update-in-place” policy tentative stores by the transaction are directed “in-place” to the write locations in the shared memory. Under this policy, the transaction system must be able to roll-back or undo the tentative stores made to the shared memory by the transaction if the transaction aborts. Generally, the original data values of the stored-to shared memory locations are kept in an undo log. If the transaction aborts, the undo log is used to roll-back the in-place stores that were made to the shared memory. In addition, transactional memory load operations and memory store operations performed by another transaction are not allowed to overwrite or load the in-place stores performed by this transaction until after this transaction commits. Under both policies, transactional memory store operations to the shared memory are deferred in the sense that memory store operations performed by one transaction are not made visible to other transactions until the one transaction successfully commits.
Transactional support can be implemented in hardware, as hardware transactional memory (HTM), in software as software transactional memory (STM), or a hybrid of hardware and software (HyTM).
The Transactional Locking (TL) family of STMs (TL STM) uses versioned lockwords to support the transactional memory capability. Each location in the transactional portion of the shared memory is associated with a versioned lockword that “covers” that location (a single versioned lockword can cover multiple locations in the shared memory). The transaction maintains a record of which locations were read from the shared memory (the read-set) and which locations in the shared memory will be (or have been) written with the result data values when the transaction commits (the write-set). The transaction's read-set locks are the set of locks that cover the locations in the shared memory represented by the transaction's read-set. The transaction's write-set locks are the set of locks that cover the locations in the shared memory represented by the transaction's write-set. The read-set and/or write-set can also be partially or completely stored in the shared memory.
A versioned lockword can be a word in memory that includes a lock portion that serves as a lock in the computer science sense, and a version number field that holds a version number. The versioned lockword can be, or be contained in, a lock object. The version numbers are used to track shared memory consistency through the life of a transaction. The versioned lockwords can be stored in the shared memory in a structure that is distinct from the shared data (in that sense, the versioned lockwords are metadata about the shared locations). A function that maps a location in shared memory to a lock can associate a contiguous block of shared memory locations (for example, a stripe) with a given versioned lockword (a stripe-lock). Other data-lock arrangements can also be used.
FIG. 1 illustrates a multi-processor, shared memory system 100 that represents the conceptual aspects of transactional memory in a multi-processor system using a shared memory. The multi-processor, shared memory system 100 includes a shared memory bus 101 that makes a shared memory 103 available to a first processor 105 through an Nth processor 107 using any of a variety of well known shared memory access technologies that enable the processors to operate a lock in the shared memory. The first processor 105 can execute computer instructions from its own local memory (such as a first processor local memory 108) or from the shared memory 103.
Multiple threads-of-execution can access the shared memory 103. The multiple threads can be distributed though the first processor 105 through the Nth processor 107, and each processor can itself have multiple threads. For example, the first processor 105 can include a thread that executes a first transaction 109 and a thread that executes a second transaction 111. These threads can access a transactionally shared data region 113 of the shared memory 103 as a transactional memory. The shared memory 103 can also contain a transactional metadata region 115 to provide information about the state of the transactionally shared data region 113. This information can include metadata used to lock locations for exclusive access by a transaction as well as metadata used to detect conflicts between transactions. In some embodiments, the shared memory 103 contains an optional global clock region 117 to assist with particular transactional memory protocols. A transaction is performed by a thread-of-execution. Some threads perform a single transaction during the thread's lifetime; other threads can perform multiple transactions.
The first transaction 109 can include a first transaction write-set 119 and a first transaction read-set 121 either in the first processor local memory 108 (as shown in the figure) or in the shared memory 103. Generally, other transactions would have similar structures (for example the second transaction 111 includes a second transaction write-set 123 and a second transaction read-set 125). The read-set and write-set (together with the information in the transactional metadata region 115) maintain information that is used by the transaction system to determine whether a conflict has occurred between transactions and to take appropriate action if such conflict should happen.
One skilled in the art will understand that there are many different shared memory technologies that, for example, can use processor cache lines, multiple memory access paths, etc, and that such technologies are conceptually equivalent to that shown. These different technologies provide different solutions and approaches to handling access contention between the multiple processors that share the shared memory. These problems and approaches are known to one skilled in the art. Such a one will also understand that the transactionally shared data region 113 and the transactional metadata region 115, while shown in the figure as being defined regions in the shared memory 103 can, in fact be scattered through-out the shared memory 103 without limitation other than as required for the lock structure (for example, stripe-locks or cache-line locks may require more explicit placement for a fragmented implementation of the transactionally shared data region 113.
A stripe is a contiguous region of shared memory that maps to a given versioned lockword. Typical stripe widths for an STM are either one fullword or one memory cache line. If stripes are wide then an update transaction may tend to require fewer high-overhead atomic operations (for example, compare-and-swap (CAS)) at commit-time to acquire the locks covering the shared memory 103 locations that are to be updated by the result data values in the transaction's write-set. On the other hand, narrow stripes offer better potential parallelism. Some transactional memory systems enable the stripe width to be dynamically set. Such systems allow coarse-grain locking until sufficient contention is detected; at which time the system switches the coarse-grain lock to a set of fine-grain locks. By automatically splitting the locks and switching to finer grained locking these systems minimize the number of high-latency atomic operations needed to lock low-contention fields while they maximize the potential parallelism for operations that access high-contention fields.
FIG. 2 illustrates a transactional load process 200 as used by the TL family of STMs. When a transaction needs to access data from the transactionally shared data region 113 the transaction invokes the transactional load process 200 through a ‘start transactional load’ terminal 201. The transactional load process 200 continues to a ‘write-set value’ decision procedure 203 that determines whether the data value to be loaded is currently stored in the transaction's write-set (for example, the first transaction write-set 119). This can occur if the data value has previously been modified by the transaction and thus is in the write-set awaiting the transaction's commit. If so, the transactional load process 200 continues to a ‘load data value from write-set’ procedure 205 to load the data value from the transaction's write-set and then the transactional load process 200 completes through an ‘end load’ terminal 207.
However, if the ‘write-set value’ decision procedure 203 determines that the data value is not in the write-set, the transactional load process 200 continues to a ‘load lock’ procedure 209 that accesses the transactional metadata region 115 to examine the lock for the portion of the transactionally shared data region 113 that contains the data value that is to be loaded. Next a ‘locked’ decision procedure 211 determines whether that data value has been locked by some other transaction. If so, the transactional load process 200 continues to a ‘retry or abort’ procedure 213 that will either abort the transaction (for example if too much time has passed attempting to acquire the lock), or retry the ‘load lock’ procedure 209 (possibly after some back-off collision avoidance delay).
However, if the ‘locked’ decision procedure 211 determines that the data value from the transactionally shared data region 113 is not locked, the transactional load process 200 continues to a ‘load data value from shared memory’ procedure 215 that loads the data value from the transactionally shared data region 113. In the TL2 implementation of STM, the transactional load process 200 continues to an ‘optional lock-validation’ procedure 217 that verifies that the version number of the acquired lock is less than the value of the global clock that was read by the transaction when the transaction started (sometimes termed the read-version). If the acquired lock's version number is not valid, the ‘optional lock-validation’ procedure 217 will abort the transaction. Otherwise the transactional load process 200 continues to a ‘record data value and lock in read-set’ procedure 219 that records a read-set entry (from the location read from the transactionally shared data region 113) and the version number of its covering lock's version number field in the transaction's read-set. Then the transactional load process 200 completes through the ‘end load’ terminal 207. The read-set entry contains or references the value of the covering lock's version number field, the location in shared memory of the loaded data value, and can include the loaded data value itself.
In the TL1 implementation of STM, the transactional load process 200 continues directly from the ‘load data value from shared memory’ procedure 215 to the ‘record data value and lock in read-set’ procedure 219 for processing and completion as previously described.
The transaction performs its logical-unit-of-work during the speculative execution phase using the data values it has loaded from the transactionally shared data region 113 (the read-set) and generates result data values (tracked by the write-set) that will be exposed in the STM (by either not rolling back changed data values or writing the result data values to their corresponding write locations in the transactionally shared data region 113) to other transactions after the transaction commits.
After the transaction has completed its speculative execution phase it enters its commit phase to commit the results of the speculative execution phase back to the transactionally shared data region 113. During the speculative execution phase the transaction has maintained the write-set that tracks which data values in the transactionally shared data region 113 are to be changed and exposed by the transaction on commit. During the commit phase the TL STM acquires the write-set locks for the portions of the transactionally shared data region 113 that are associated with the write-set and then checks that the previously observed versions of the versioned lockwords kept in the read-set still match the commit-time versions of the versioned lockword (in the transactional metadata region 115) covering the shared locations in the transactionally shared data region 113. The transaction aborts if the read-set versions do not match the transactional metadata region 115 versions at commit-time. If the versions do match the transaction is deemed successful and the commit phase exposes the contents of the write-set to other transactions (for example, by writing the write-set data values to their respective write locations in the transactionally shared data region 113). At the end of the commit phase the transaction increments the version numbers for the versioned lockwords covering the write-set and releases those locks.
FIG. 3 illustrates a TL1 transaction commit process 300 that is used by the TL1 STM to attempt to commit a transaction that has completed its speculative execution phase and has not aborted. The TL1 transaction commit process 300 is invoked by the transaction at a start commit terminal 301 and continues to an ‘acquire write-set lock’ procedure 303. The ‘acquire write-set lock’ procedure 303 locates the write locations that are to be updated with data values kept in the transaction's write-set and acquires the locks from the transactional metadata region 115 covering those write locations in the transactionally shared data region 113. Once the locks are acquired, other transactions cannot modify the data values in the transactionally shared data region 113 covered by the locks. Next, the TL1 transaction commit process 300 continues to a ‘valid read-set’ decision procedure 305 that validates the transaction's read-set by comparing the version numbers of the locks stored in the transaction's read-set with the current version numbers of those locks in the transactional metadata region 115. If the version numbers of the locks stored in the transaction's read-set are different from the current version numbers of those locks in the transactional metadata region 115 (thus, the corresponding data value(s) may have changed), the TL1 transaction commit process 300 continues to an ‘abort transaction’ procedure 307 that aborts the transaction and then to an ‘error exit’ terminal 309. The procedure that attempted the transaction can then retry the transaction.
However, if the ‘valid read-set’ decision procedure 305 successfully validated the transaction's read-set, the TL1 transaction commit process 300 continues to a ‘store write-set data to shared memory’ procedure 311 that copies the result data values from the transaction's write-set to the appropriate write locations in the transactionally shared data region 113. After the transaction has completed updating the transactionally shared data region 113, an ‘increment and release write-set locks’ procedure 313 updates the version numbers for the write-set locks (in the transactional metadata region 115) acquired by the ‘acquire write-set lock’ procedure 303 and releases those locks. Finally, TL1 transaction commit process 300 successfully exits through an ‘end commit’ terminal 315.
The TL STM just described is vulnerable to “zombie” transactions. Zombie transactions are those that have read an inconsistent set of data values from the transactionally shared data region 113 but have not yet aborted. Zombie transactions can enter infinite loops, generate traps, and otherwise misbehave as a result of attempting to process inconsistent data values from the transactionally shared data region 113 of the shared memory. Zombie transactions are one cause of corruption of buffers that have been isolated by the transaction from the transactionally shared data region 113.
Some of the ways an STM implementation can address zombies (and similar buffer corruption pathways) is by, for example but without limitation, 1) periodically validating the read-set against the transactionally shared data region 113 during the speculative phase; 2) re-validating the current read-set after each transactional load; 3) the use of globally consistent version numbers (for example, as implemented in the TL2 STM), or by 4) using readers-writer locks.
1) Periodic validation of the read-set during the speculative phase is suitable for language-based STMs with managed runtime environments where a just-in-time compiler can emit validation checks, and the runtime can gracefully tolerate traps, and can triage and translate such traps into aborts.
2) Re-validating the current read-set after each transactional load from the transactionally shared data region 113 detects and avoids long term zombie processing. This approach is safe and puts less burden on the compiler and runtime environment but incurs a validation cost quadratic with the size of the read-set.
3) The use of globally consistent version numbers (for example as used in the TL2 STM) avoids zombie execution. Globally consistent version numbers usage efficiently avoids both the potential interconnect scalability issues of readers-writer locks and the quadratic validation latency cost of re-validating after each transactional load. The use of globally consistent version numbers (for example, the optional global clock region 117) is suitable for both managed and unmanaged execution environments.
4) Using readers-writer locks on a per-stripe basis instead of versioned lockwords also prevents zombies. Readers-writer locks provide exclusive write-access to a transaction (the “transactional writer”) or concurrent read-access to multiple transactions (the “transactional readers”). If a transactional writer has acquired the readers-writer lock in the writer mode, no transactional readers can acquire the lock. If any transactional readers have acquired the readers-writer lock in the reader mode a transactional writer cannot acquire the lock until all the transactional readers have released the readers-writer lock. During the speculative execution phase of a transaction, the transaction's load operation first acquires a readers-writer lock (using reader mode) covering the location in the transactionally shared data region 113 being loaded and then fetches the data value from the covered location. During the commit phase the transaction acquires the readers-writer locks (using writer mode) for stripes covering the transaction's write-set; releases the previously acquired readers-writer locks (acquired using reader mode); exposes the write-set (for example by storing data from the write-set to the write locations in the transactionally shared data region 113); and releases the readers-writer locks (acquired using writer mode). The readers-writer locks (acquired using reader mode) can be released at commit-time after all the required readers-writer locks (using writer mode) have been acquired. Because the read-set is always consistent validation is never needed and there are never any instances of zombie execution.
Readers-writer locks can be implemented as a versioned lockword that includes a readers-count field and a write-lock bit. Readers-writer locks are operated upon with atomic instructions such as CAS. The readers-writer locks operate under the following rules: the readers-writer lock can be acquired using writer mode only when the readers-count is zero and the write-lock bit is clear; a readers-writer lock can be acquired in reader mode (but only when the write-lock bit is clear) by incrementing the readers-count field. One skilled in the art of shared memory multi-processor systems will understand the performance and synchronization issues associated with atomic memory operations, cache lines, etc. One skilled in the art will understand that incrementing the readers-count field means changing the value in the readers-count field by an amount and that decrementing the readers-count field means changing the value of the readers-count field by an amount inverse to the amount (generally, the adjustment logic used to change the value of the readers-count field by an amount can increment the readers-count field and the adjustment logic used to change the value of the readers-count field by the inverse of the amount can decrement the readers-count field).
To avoid the situation where a set of transactional readers starve or impede one or more transactional writers by keeping the readers-count continuously above zero, a readers-writer lock implementation can allow a contending transactional writer to request that transactional readers desist (such that the transactional readers cannot acquire the readers-writer lock in reader mode) and “drain”. Once the readers-count reaches zero the transactional writer has an opportunity to acquire the readers-writer lock in writer mode. This prevents indefinite transactional writer starvation. Readers-writer locks with this capability generally include a drain flag which, when set by a potentially starving transactional writer, requests that subsequently arriving transactional readers—transactional readers that intend to acquire readers-writer lock using reader mode—stall, and defer incrementing the readers-count until either (a) that count reaches zero, meaning that transactional writers had a fair chance to contend for the readers-writer lock, or (b) until some transactional writer acquires the readers-writer lock and clears the drain flag. In a readers-writer lock the drain flag is often stored in a single-bit drain field.
Another way to avoid the situation where a set of transactional readers starve or impede one or more transactional writers (by keeping the readers-count continuously above zero) is to implement a protocol where a single transactional writer can atomically set the write-bit (thus acquiring the write-lock) even though the readers-count is non-zero. The transactional writer, having acquired the exclusive write-lock, cannot proceed until it observes that the readers-count is zero. Transactional readers that arrive subsequent to the transactional writer having set the write-lock bit must stall, waiting for the write-bit to become clear before they atomically increment the readers-count field and proceed. Yet another technology that can be used to avoid this situation is to use a read-indicator. A read-indicator maintains state (for example, “some” or “none”) as to whether any readers exist. A read-indicator is in use if some transaction is using it and free if no transaction is using it. A read-indicator can be implemented using a reader counter or a SNZI-like read-indicator, which is subsequently described and is more scalable than a reader counter because it requires less mutation of shared data and thus less coherence traffic on SMP systems.
FIG. 4 illustrates a TL2 transaction commit process 400 that is used by the TL2 STM to attempt to commit a transaction that has completed its speculative execution phase and has not aborted. The TL2 transaction commit process 400 is invoked by the transaction at a start commit terminal 401 and continues to a ‘lock write-set’ procedure 403. The ‘lock write-set’ procedure 403 locates the locations in the transactionally shared data region 113 that are to be updated with data values kept in the transaction's write-set and acquires the locks covering those locations. Once the locks are acquired, other transactions cannot modify the data values in the transactionally shared data region 113 covered by the locks.
Once the write-set locks are acquired, the TL2 transaction commit process 400 continues to an ‘advance global clock’ procedure 405 that atomically fetches the value of and then increments the optional global clock region 117. The fetched value of the global clock is the value that will be used as the write-version when the write-set locks are released as is subsequently described.
Next, the TL2 transaction commit process 400 continues to a ‘verify read-set’ decision procedure 407 that validates the transaction's read-set by verifying that the version numbers for the locks in the transaction's read-set are consistent with the value of the global clock when the transaction started (the read-version). The ‘verify read-set’ decision procedure 407 can use techniques similar to those previously described with respect to FIG. 2. If the transaction's read-set is not valid, the TL2 transaction commit process 400 continues to an ‘abort transaction’ procedure 409 that aborts the transaction and then to an ‘error exit’ terminal 411. The procedure that attempted the transaction can then retry the transaction.
However, if the ‘verify read-set’ decision procedure 407 successfully validated the transaction's read-set, the TL2 transaction commit process 400 continues to an ‘store write-set data to shared memory’ procedure 413 that copies the data values from the transaction's write-set to the corresponding write locations in the transactionally shared data region 113. After the transaction has completed updating the transactionally shared data region 113, an ‘update and release write-set locks’ procedure 415 updates the version numbers for the locks acquired by the ‘lock write-set’ procedure 403 with the write-version value obtained by the ‘advance global clock’ procedure 405 and releases those locks. Finally, the TL2 transaction commit process 400 successfully exits through an end commit terminal 417.
The previous descriptions have provided sufficient information needed to allow one skilled in the art to implement the TL Family of STMs without undue experimentation. Supplemental information about the TL Family of STMs is listed in the Cross Reference to Related Applications Section herein.
There are situations where a programmer of one transaction would find it useful to isolate a memory buffer with the intent of making the memory buffer inaccessible to other transactions so that the one transaction can access the buffer without using the transaction protocol (for example, a transaction-related process could unlink a buffer from a transactionally maintained concurrent list so that the thread can use the buffer as normal memory available to the thread's processor even though the memory underlying the buffer is shared memory) or to release the memory underlying the buffer for reallocation. However, note that even though the buffer cannot be accessed by any other transaction (executing on the same or other processor) AFTER the transaction-related process that isolates the buffer commits, prior to the commit, latent transactional stores that might be pending or executed by previous transactions that accessed the buffer before it was isolated can still write into the shared memory buffer that was intended to be isolated. This unexpected behavior is known as the “privatization problem.” This problem manifests by having a latent transaction overwriting a portion of shared memory that underlies a buffer that has been isolated (removed) from the transactionally shared data region 113. This results in unexpected changes to the contents of the isolated shared memory (which may have been reallocated and (although resident in the shared memory)) is intended to be outside of the transactionally shared data region 113. Other unexpected, generally asynchronous, behaviors can also occur.
In STM designs that use the “update-in-place” policy, such transactional stores may be executed by zombie transactions (that accessed the buffer before it was isolated but haven't yet found that they were aborted by the isolating transaction—and when they do, roll back the data values in the buffer). In STM designs with the “speculative store buffer policy”, all stores are deferred until after the transaction has committed successfully, so such stores can be executed only by transactions that committed before the isolating transaction did. Thus, when a transaction isolates a memory region from a transactional data structure and that region is subsequently accessed non-transactionally (that is, the region “escapes” the transactional domain), latent transactional stores pending to the region can corrupt the isolated memory region.
For example, consider the following scenario: assume transactions tx1 and tx2 operate concurrently under TL2; assume that tx2 is the isolating transaction; assume tx1 is a transaction that writes to the isolated buffer; and assume tx1/tx2 interleave as shown in Table 1.
TABLE 1A is initially non-nulltx1: { A->Field=3; }tx2: { tmp=A; A=new; } free (tmp)tx1tx2==================StartStartTXLD ATXLD A; tmp=ATXST A->Field=3TXST A=newCommit Lock A->Field Validate A ...Commit Lock A Validate A ST A=newDonefree (tmp) ST A->Field=3 [!]Note that time increases from the start to the last line. Also note that the store by tx1 “ST A→Field=3” constitutes a use-after-free error as the tx2 expects tmp to be isolated from the transactional system at the time of the free and thus unchanged by other transactions. To avoid unexpected behaviors like the privatization problem, most STM designs (including those of the TL family previously described) do not allow concurrent transactional and non-transactional access to the same set of shared memory locations. Note, for subsequent discussion, we continue to use tx2 as the isolating transaction, and tx1 as the transaction that writes to the isolated buffer.
STMs are vulnerable to these types of unexpected behaviors whenever one non-aborting transaction tx1 reads a location “A” that some other non-aborting transaction tx2 subsequently modifies, and, within the same transaction, tx1 subsequently writes to other location(s) data value(s) that is dependent on the data value loaded from “A”, and tx2 returns before all tx1's commit-time stores have completed. Critically, tx2's write-set intersects tx1's read-set, but the timing was such that tx1 committed before tx2 and thus neither aborted despite the seeming conflict.
To provide privatization capability to a transactional memory, the transactional memory can employ either “explicit privatization”, where the programmer explicitly designates regions passing out of transactional use to be quiesced, waiting for any pending transactional stores to complete before the memory is allowed to be accessed non-transactionally, or “implicit privatization”, where the STM automatically manages such lifecycle issues. “Code Generation and Optimization for Transactional Memory Constructs in an Unmanaged Language,” by Cheng et al., CGO 2007, describes an implicit privatization mechanism that quiesces threads instead of shared memory regions, potentially impacting overall scalability.
Programming explicit buffer quiescence (for example programming explicit privatization) is complex and error prone. For example, it is insufficient for a transaction to explicitly privatize a buffer from the transactionally shared data region 113 before modifying that buffer. For example, assuming two variables VShared and V, the following demonstrates the problem.
Thread 1:Transaction T1 { if (VShared) V=7; }Thread 2:Transaction T2 { VShared=False; };operate on V non-transactionallywhere if T1 and T2 interleave execution such that after T1 reads Vshared but before it assigns the value seven to V, T2's operation on V can be overwritten by T1 when it resumes. Another problematic situation is if a location from the transactional metadata region 115 is passed to a legacy library routine that is not transactional memory aware (and thus the library routine will assume that the data value in the location is not externally modified and will use non-transactional loads and stores to accesses the location). This will result in inconsistent operation of the library routine as sometimes the data value of the location will be externally modified unexpectedly.
It would be advantageous to use transactional memory processes that do not have the previously described problems.