The present invention relates generally to field of memory access. More specifically, the present invention is directed to a method and an apparatus for improving the performance of memory access.
Cache is a special high-speed storage mechanism. Cache is usually made up of high-speed static random access memory (SRAM) instead of the slower and cheaper dynamic random access memory (DRAM) that is used for main memory. Caching is effective because most programs access the same data or instructions over and over. By keeping as much of this information as possible in the cache, the computer avoids delay caused by accessing the slower main memory. As such, cache performance is important because of its effect on the performance of the processors.
Traditionally, each cache has two sub-systems, a tag subsystem which holds the memory addresses and determines whether there is a match for a piece of information requested by the processor, and a memory subsystem which holds and delivers the data. A cache hit occurs when the piece of information that the processor requests is found in the cache. A cache miss occurs when the information is not in the cache.
One technique for improving the performance of the cache is to increase the cache size. Having a larger cache size helps increasing the possibility of a cache hit. However, this technique is expensive due to the high cost of SRAM. Furthermore, the larger cache size increases the cache access cycle time.
One disadvantage of the traditional cache system is that all memory accesses are treated the same even though the memory accesses may not be the same. It is a one-fit-all memory hierarchy. Accessing a single piece of data is treated the same as accessing each piece of data in an array of data. There is no ability to distinguish different workloads and there is no ability to provide a different treatment for each different workload.
An embodiment of the present invention provides for an apparatus for memory access demarcation. Data is accessed from a first cache, which comprises a first set of addresses and corresponding data at each of the addresses in the first set. A plurality of addresses is generated for a second set of addresses. The second set of addresses follows the first set of addresses. The second set of addresses is calculated based on a fixed stride, where the second set of addresses are associated with data from a first stream. A plurality of addresses is generated for a third set of addresses. The third set of addresses follows the first set of addresses. Each address in the third set of addresses is generated by tracing a link associated with another address in the third set of addresses. The third set of addresses is associated with data from a second stream.