Achieving high performance for communications between concurrent applications on modern multiprocessors remains challenging. For example, when two programs, applications, computers, threads, etc. are exchanging data the “producer” of the data puts the data into a sending area (e.g., send buffer), and the “consumer” receives the data within a receiving area (e.g., a receive buffer). However, performance of producer-consumer patterns tends to be limited due to the synchronization mechanism that is required to guaranty the correct utilization of a remote buffer or a shared buffer. In order to mitigate such issues, many programmers try to avoid locking to improve performance, while others replace locks with non-blocking synchronization.