Processors have become very fast in comparison to memory. This difference in speed has continued to widen over the past years with advances in technology. Consequently, this gap between processor and memory speed has created a “memory wall”, whereby the processor is often times starved with nothing to do while waiting for memory. This problem is significantly exacerbated on SMT (Simultaneous Multi-Threaded) processors as threads collide with each other, and on multiprocessor systems as many processors attempt to access from the same memory that is farther away.
Caches are an attempt to alleviate the high cost of going to memory by keeping certain expected-to-be important data items in fast memory that is close to the processor. These data items are kept in cache-lines. The cache lines are tagged using the memory address where the data item is stored. Because a cache is usually much smaller than the main memory, many memory addresses are mapped to the same cache line.
In modern microprocessors, memory access operations from different applications and the operating system are treated identically at the cache and memory side. When a cache or memory receives a memory access operation, it is usually unable to distinguish which thread (or process) has issued the memory request. This causes several problems in terms of interference, cache performance, quality of service guarantees, and efficiency.
One such problem, in terms of interference is caused because of the lack of distinction of which cache lines are used by the Operating System and which cache lines are used by the application. Thus, Operating System data and application data may interfere with each other.
Another problem occurs because of the lack of distinction between cache lines used by different threads. Cache performance of an important thread can be affected, and suffer, when cache lines are evicted prematurely by another thread. Quality-of-Service guarantees also suffer as a consequence because it is impossible to guarantee that a thread will not evict data cached by another thread. An example of such a problematic situation arises with the ability to guarantee correctness of a program. The memory system generally needs to be conservative to ensure correctness of the program. For example, when a memory barrier instruction, such as the PowerPC's SYNC instruction, is executed, it must be guaranteed that all of the previous memory accesses be completed before the memory barrier instruction completes. In fact, it is generally sufficient to guarantee the memory ordering for memory accesses from the same thread (or process). However, valuable information is lost because the memory system cannot distinguish between memory access operations from different threads (or processes) and thus there is no assurance of the correctness of the program or consequently of Quality-of-Service.
A fourth, and somewhat dissimilar problem, occurs because software cannot explicitly affect cache line placement and replacement. For example, it is impossible for software to specify a set of cache lines for a streaming application. As is evident, efficiency suffers as a result of the inability of software to have a small amount of control over cache lines or the data placed in certain cache lines. Currently, there are no known limitations or specifications such as those of the instant invention in place to prevent a streaming thread utilizing the cache from thrashing the cache of the other threads. Specifically, performance of the other threads in the application would improve if a few of the cache lines were specified for the streaming thread. Although the performance of the streaming thread will not be affected, the streaming thread will not thrash the cache of the other threads and thus will significantly improve efficiency and performance of the other threads utilizing the cache.
Thus, there exists a need in the art for a color-based cache monitoring system that improves memory performance and efficiency, guarantees quality-of-service, and removes interference issues.