In modern microprocessors, memory access operations from different applications and the operating system are treated identically at the cache and memory side. When a cache or memory receives a memory access operation, it is usually unable to distinguish which thread (or process) has issued the memory request.
A problematic situation arises with the ability to guarantee correctness of a program. The memory system generally needs to be conservative to ensure correctness of the program. For example, when a memory barrier instruction, such as the PowerPC's SYNC instruction, is executed, it must be guaranteed that all of the previous memory accesses be completed before the memory barrier instruction completes. In fact, it is generally sufficient to guarantee the memory ordering for memory accesses from the same thread (or process). However, valuable information is lost because the memory system cannot distinguish between memory access operations from different threads (or processes).
Another problem affects the accuracy or efficiency of certain hardware prediction mechanisms that may get confused due to memory accesses from irrelevant threads. For example, consider a hardware mechanism to predict streaming data access patterns. When multiple threads are generating independent memory accesses, it would be difficult or expensive for the hardware prediction mechanism to filter “unrelated’ memory accesses in order to recognize a streaming pattern generated by one thread (or process).
Further, difficult situations arise when a cache is shared by many threads (or processes). In such a situation, it is very important to be able to monitor cache behaviors for different threads (or processes) in order to efficiently manage those threads (or processes). This can be difficult if there is no way to tell which thread (or process) has issued a memory access request, such as in the memory systems of the prior art.
Thus, there exists a need in the art for a cache and memory system which is able to distinguish between memory operations from distinct threads (or processes). Such a system would ensure a more efficient caching mechanism while increasing the accuracy and efficiency of hardware prediction mechanisms.