Technical Field
The embodiments herein generally relate to hardware accelerators and, more particularly, to a system and a method to convert lock-free algorithms to wait-free algorithms using hardware accelerators.
Description of the Related Art
Multithreading is a process of executing multiple software threads simultaneously and is indicative of an ability of a program or an operating system process to manage its use by more than one user at a time and to also manage multiple requests by the same user without the need of causing multiple copies of the software program to run in the computer. Typically, central processing units (CPUs) have hardware support to efficiently execute multiple software threads simultaneously. However, CPUs enabled with the multithreading capabilities are distinguished from multiprocessing systems (such as, multi-core systems) in requiring sharing of one or more resources of a single core including computing units, a CPU cache and a translation look aside buffer (TLB) for enabling simultaneous execution of multiple software threads. Most of multiprocessing systems use a variety of techniques to ensure integrity of shared data, the techniques including for example, locking mechanisms, software (SW) based lock-free algorithms, hardware (HW) assisted lock free algorithms, transactional memory, and the like. The typical sequence for implementing the lock-free algorithm includes reading a value from a store, performing a set of operations with computation and performing condition checks involving read/write value (VALUE), a parameter and/or state variables (STATE) in the store. If the operation succeeds, then VALUE and STATE is updated in the store and VALUE is returned else the request fails.
Apart from the operation specific condition check failing, failure can also happen due to atomicity being violated i.e. multiple threads try to execute the above sequence with at least one of the steps overlapping in time. The atomicity violation problem also leads to one attempt succeeding and all other subsequent attempts failing. When there is a failure due to atomicity violation, an application thread is expected to retry the operation (OPn) and hence the operation is not “wait-free”. Depending on a prioritization, design and state of the system, multiple attempts have to be made before an attempt succeeds. The above approach makes timing requirement of the system unpredictable and therefore the approach may not be suitable for use in systems requiring deterministic behavior. Detection of atomicity violation is often performed using value of a location. For example if a thread read ‘A’ as the value from the location and need to update it to ‘N’ it may issue an atomic Compare and Swap (CAS) instruction which can update the ‘location’ to ‘N’ if still holds ‘A’ but fails if the location contains any other value (due to another thread updating the value). But checking that the location still has ‘A’ does not mean it has not been updated, the location could have been changed from ‘A’ to say ‘B’ and then back to ‘A’ by one or more other threads. This scenario is termed the ‘ABA’ hazard which leads to incorrect results. Typical implementations of lock-free algorithms suffer from this.
Eliminating hazards like the ABA problem further complicate implementation requiring additional overhead with Compare and Swap (CAS) and extremely conservative approach with LL/SC (Load Link/Store Conditional) in determining atomicity violation (mostly due to the higher cost of accurate determination), leading to atomicity failures even in cases where it would have been safe for the operation to succeed. The wait-free algorithms can be created for certain structures, but their performance is worse than lock-free or even lock-based approaches. In some cases they also require memory proportional to the number of application threads. Accordingly, there remains a need for an efficient system to reduce the problem of atomicity, the ABA hazard that facilitates ensuring integrity of shared data.