In concurrent systems such as the multi-processing or multi-threaded systems, multiple threads may run in parallel, each thread running in its own context. The data values used by one thread is not directly visible or accessible by another thread, for instance, including each thread's register values. Currently there is a lack of hardware support threads to efficiently communicate data from one to another.
For example, while there are systems available that use another thread to pre-fetch data for the main thread, they do not use hardware support to allow one thread to efficiently communicate data to another thread, for instance, the data that tells the helper thread when to start or stop the prefetching. Without suitable hardware support, the overhead of repeatedly invoking a helper thread for data prefetching at different points in the application program can be very high. This overhead can also negate any performance benefits obtained as a result of the data prefetching.
Moreover, most methods for inter-thread communication are meant to be used primarily for synchronization or locking purposes. Those methods usually incur additional overhead and/or are designed to go through the shared memory hierarchy, which can also incur overhead. While there have also been proposals for efficient, hardware-assisted copying of register values between threads, these proposals are limited in that they can only copy the entire register file or a fixed subset of the register file, and they do not scale with increasing numbers of registers. Thus, what is desirable is a generic inter-thread communication mechanism capable of transferring arbitrary data values between any two threads.