The present invention relates to a pipelined processor generally and, more particularly, to a data-path for a data-cache within the processor.
Performance of a pipelined processor is determined in part by the speed at which data is moved through the memory stage of a data pipeline. A basic operation of the memory stage is to store and load data to and from a data-cache memory. A specific operation of a data-path associated with the data-cache memory includes byte-aligning or shifting the data for presentation to a central processor unit. Other specific operations of the data-path include driving a local data bus and gathering data from multiple sources for presentation to the data-cache memory. Each of the above operations has a potential to introduce delays that can ultimately affect the performance of the pipelined processor.
The architecture of the data-path before and after the data-cache memory influences the performance of the memory stage of the data pipeline. The data-path leading into the data-cache memory can degrade performance by presenting the data such that each store operation constrains access to the data-cache memory for multiple run cycles. The data-path following the data-cache memory can also degrade performance by delaying presentation of data read from the data-cache memory to other devices within the processor.
The present invention concerns a circuit comprising a data-cache memory and a data-path circuit. The data-cache memory may be configured to (i) store a cache input data item among a plurality of associative sets and (ii) present a plurality of cache output data items. The data-path circuit may be configured to (i) independently shift each of the plurality of cache output data items and (ii) multiplex the plurality of shifted cache output data items to present an output data item.
The objects, features and advantages of the present invention include providing a method and/or architecture for implementing a data-cache data-path that may (i) improve the cycle time at which data can be stored in the data-cache memory; (ii) improve the cycle time at which data read from the data-cache memory can be presented to other devices; and/or (iii) eliminate false long paths that complicate timing analysis of the data-path.