1. Field
The present disclosure is generally directed to memory scheduling. More particularly, the present disclosure is directed to memory scheduling for RAM caches based on tag caching.
2. Background Art
Memory performance has become increasingly important to overall system performance. As a result, there is an increased need to carefully schedule memory operations. This is particularly important with respect to a Random Access Memory (RAM) cache.
A RAM cache operates like a conventional cache. Specifically, a RAM cache stores copies of data from a plurality of memory locations in main memory. Memory requests are issued to the RAM cache first. If there is a cache hit, the requested data is returned from the RAM cache. Otherwise, the request is sent to main memory.
A RAM cache may be implemented with a memory technology that employs row-buffers. Memory technologies employing row-buffers may include, but are not limited to, dynamic random access memory (DRAM), embedded dynamic random access memory (eDRAM), phase change memory (PCM), and the like. Specifically, a RAM cache is typically split into multiple equal-sized units called banks, with each bank having a row-buffer. Each bank is organized as a plurality of rows. Each row contains data blocks and corresponding tag blocks. The tag blocks are used to locate the data blocks in the row.
Before reading or writing a memory location, the row containing that memory location is read into the bank's row-buffer. This is called opening the row. The requested memory location is then read from or written to the row-buffer. The opened row is stored in the row-buffer until it is explicitly closed.
In such an architecture, if there is a request to a memory location in an open row, the memory location can be serviced immediately from the row-buffer. This is called a row-buffer hit. If, however, the request is to a memory location not in an open row, the row-buffer must be closed and the row containing that memory location must be read into the row-buffer. The request is then serviced from the row-buffer. This is called a row-buffer conflict and it results in a memory stall.
Given the performance advantage of issuing requests that hit a row-buffer, conventional memory scheduling often uses row-buffer locality aware algorithms, such as FR-FCFS (first-ready, first-come first serve) to reduce row open/close penalties. For example, in the FR-FCFS algorithm, memory requests that would hit in a row-buffer are given priority. In addition to minimizing row open/close penalties, the cost of servicing memory requests from main memory is reduced.
Even with row-buffer locality aware algorithms, however, a RAM cache may suffer from inefficiencies. First, data requested from the RAM cache may not be present therein. As a result, there may be unnecessary lookups in the RAM cache. Second, even if a request hits in the RAM cache, locating a data block in the RAM cache is often expensive. Specifically, in order to locate a data block, all the tag blocks in the row must be read. This is very costly. Finally, consecutive memory addresses are often not mapped to the same row in the same bank due to the cache block size and typical address indexing schemes of RAM caches. As a result, the number of requests that fall in the same open row in the RAM cache is typically small.