The present application relates generally to an improved data processing apparatus and method and more specifically to mechanisms for enhanced wiring structure for a cache supporting auxiliary data output.
Two of the key performance metrics of cache design are fetch bandwidth and access latency. Optimally, all data that can be read from the cache arrays with one access would be transferred in one cycle into the next lower cache hierarchy, such as from an L2 cache to an L1 cache. Due to physical limitations, a common design point is to arrange the data arrays in logical, and often actual physical, “rows” and transfer only as many rows in one “data shot” as routing and cycle time constraints allow.
A major contributor to fetch bandwidth is the width of the fetch return data bus. To minimize latency, the wires used for these data busses often have to be high performance wires. In a given microprocessor technology, there is a limited amount of wire available, particularly for high performance wires.
Many cache designs have more than one consumer of data reads from the cache. A common case is a store-through design where background data is read from the cache arrays to form valid stores for the next higher cache level. Other cases may include array test logic or co-processors attached as separate consumers. The wiring resources must be shared between all of these data consumers.
Often, particularly in a microprocessor core, one of the consumers is a “most important” or primary consumer. The primary consumer should get as much of the resources as possible. This would be the case for a L2 cache data return path to an L1 cache vs. the L2 cache's store path to the L3 cache. Still, performance for these secondary consumers is an important design point.