This relates generally to computers that have general purpose processors and graphics processing units.
The memory used by user applications running on the general purpose or central processing unit and the memory used by a graphics processing unit are typically separated. A graphics processing unit driver copies data from the user space into driver memory for processing on a graphics processing unit. In a shared virtual memory model, data is not copied to the graphics processing unit, but, instead, it is shared between the graphics processing unit and the central processing unit.
Currently, in multithreaded applications, shared data is protected by locks called mutexes. Each thread that wants to access shared data must first lock a corresponding mutex to prevent other threads from accessing that mutex. This locking can be done through “spinning” on lock, but this technique is not efficient from power and performance points of view.
To optimize the central processing unit, the operating system provides system calls that allow a thread to sleep until a mutex is available and then notifies other threads when a mutex is unlocked. But this mechanism works only for threads that run on central processing unit cores.