Computing system memory architectures may be structured as various levels of host processor-side caches (e.g., level one/L1 cache, level 2/L2 cache, last level cache/LLC) and a system memory that includes a memory-side cache (e.g., “near memory”) and additional memory (e.g., “far memory”) that is slower to access than the memory-side cache. The processor-side cache may be organized into relatively small (e.g., 64 B) cache lines, whereas the memory-side cache may be organized into relatively large (e.g., 1 KB or 4 KB) blocks in order to reduce tag and metadata overhead. Thus, each 4 KB block in a memory-side cache might contain, for example, sixty-four of the 64 B processor-side cache lines.
When a search for data in the memory-side cache is unsuccessful (e.g., a cache miss occurs), a “victim” line may be selected in the memory-side cache for replacement (e.g., eviction) by the requested data, which may be retrieved from the far memory. Frequent misses in the memory-side cache may reduce performance and increase power consumption due to the retrieval of data from the relatively slow far memory. In order to reduce the likelihood of misses in the memory-side cache, each block of the memory-side cache may be compressed to make room for more data. Decompressing the memory-side cache on a block-by-block basis, however, may increase latency and overhead, particularly when the retrieved data is in the critical path of host processor read operations. Accordingly, conventional memory architectures may still exhibit suboptimal performance and/or power consumption.