Modern high performance processors typically have multiple levels of cache memory (a cache hierarchy) that store instructions and data. Each level of the cache generally has a certain latency, the time between when a request is issued to the cache and when a response is received, with the primary or level one (L1) cache having lower latency than the secondary or level two (L2) cache. Maintaining low latency in the cache system generally is important to prevent a processor stall (i.e., the processor becoming idle) when accessing the cache system for data and/or instructions.
A high performance processor may support multiple threads of execution. Generally speaking, a “thread” is a separate execution context with its own instructions and data. A thread executing on a multi-threaded processor may need to communicate data with another thread that is concurrently executing on the same or different processor. Currently, this communication may occur through the data cache hierarchy. The sending thread stores the data value in the cache and the receiving thread loads it from the cache. On processors with a write-through L1 cache, the sending thread generally has to write (commit) the data to the L2 cache before writing the data to the L1 cache to maintain data consistency. As a result, the inter-thread communication generally incurs a minimum communication latency that is approximately the L2 cache latency (incurred by the sending thread writing the value to the L2 cache) plus the L1 cache latency (incurred by the receiving thread reading the value from the L1 cache).
What is needed is a way to achieve faster inter-thread communication between threads sharing a L1 write-through cache. Faster inter-thread communication may be desirable in many scenarios. For example, faster communication between the architectural (main) thread and speculative threads may reduce overhead associated with transactional speculation, enabling the speculation to be profitable in more situations. Faster inter-thread communication may also be useful in the context of software scouting where a scouting thread performing optimizations on behalf of a main thread communicates with the main thread.