An important aspect of system performance for large symmetric multiprocessing systems with a shared store in cache design is the ability to process stores from lower level caches in an expedient manner. In cache systems utilizing a shared pipeline structure, stores are processed by sequentially accessing the shared pipeline to access the cache arrays. Some conventional systems utilized SRAM arrays for the cache, which had a busy time of up to 2 cycles. In a system with a shared sequential pipeline, stores could therefore be processed every 2 cycles.
Other conventional systems introduced the usage of embedded dynamic random access memory (EDRAM) arrays for the cache. EDRAM has an advantage of being much denser, thus allowing for larger caches, but also a drawback of having a longer array busy time, e.g., up to 4 cycles. This longer busy time significantly reduces the store throughput of the shared pipeline, which ultimately negatively impacts system performance.