1. Technical Field
The present invention relates generally to processors and computing systems, and more particularly, to distributed bus arbitration within a processor on request and data paths having differing latencies between multiple slices.
2. Description of the Related Art
Present-day high-speed processors include a high level of integration and asynchronous designs that permit communication between various resources and one or more processor cores, caches and memory in a highly-efficient manner so that data transfer and other communication occurs at a rate approaching the limits of propagation of digital signals within the processor.
In particular, internal buses of high-speed processors are permitted to transfer data and commands over paths having differing latencies and logic is provided to ensure that the data and commands are properly validated and transferred in order without requiring long synchronous cycles limited by the maximum propagation times. The above is especially true in present-day processors, where data values, program instructions and commands, as well as control signals may be pipelined through many logic stages, with the number of stages through which the above signals pass greatly dependent on chip layout.
One such logical implementation within a processing system is a distributed arbitration scheme including a processor core arbiter and one or more slice arbiters. The distributed scheme permits early indication of data transfer requests from a resource to the processor core. The timing of the early indication in the distributed arbitration case is dependent on the physical location of where data resides, whereas a centralized arbitration scheme generally only provides such indication after the additional cycles necessary to: relay requests to a central point, make the arbitration decision and then relay the decision to the processor core. Thus, centralized arbitration leads to too great a delay in providing indications to the processor core regarding the availability of data.
In such a distributed arbitration scheme, when a resource coupled to a slice arbiter is ready to transfer data to the processor core from one or more slices, the slice arbiter determines assignment of the bus needed for the transfer and thereby indicates to the slices when they may place their data on the bus. At the same time and in parallel, the requests are sent to the core arbiter so that the processor core receives an early indication of a data transfer operation. The core arbiter receives the requests after individual data latency times from the requesting slices have elapsed, enforces the same arbitration decision being made in parallel at the slice arbiter, provides an early indication to the processor core that data will be arriving, and subsequently ensures the transfer of valid data from the slices at the appropriate times when the slice data is available for latching (or loading) at the processor core.
In general, the logic required for handling a sequence of single-cycle data transfer operations is not overly complex, since each requester latency is known and further grants at the slice arbiter (and core arbiter in parallel) can be blocked in particular cycles after granting requests for another slice based on the known latencies of each slice. Further requests from the longest latency slice do not need to be blocked at all and requests from faster slices are blocked in cycles where data would be sent to the core but not selected for loading by either slice arbiter or core arbiter because data arriving from previously arbitrated requests is already being selected.
As described, the core arbiter makes the same arbitration decisions as the slice arbiter, but due to the differing latencies from the slices to the core arbiter, grants do not necessarily occur in the same order as at the slice arbiter. Nevertheless, the same requests granted by the slice arbiter are granted by the core arbiter. Since the order of the grants at the slice arbiter and at the core arbiter will not necessarily match, the data is resynchronized at the processor core (for example, by using the address or tag of the returned data). The core arbiter determines the selection of individual buses coupling the slices to the core via a multiplexer. The core arbiter determines the appropriate slice to grant in a given cycle, and thus can generate the appropriate multiplexer selector to load the data into the appropriate core register.
As described above, the core arbiter makes decisions based on logic that is consistent with the decision-making grant logic for the slice arbiter in conformity with known cycle differences for the latencies for each slice. By knowing when the slice arbiter granted an associated bus to each slice, all data provided from the slices can be used and it is not necessary to notify a slice that a data transfer failed (due to contention for the multiplexer in a given cycle), as the distributed arbitration scheme enforces successful completion of all transfers granted by the slice arbiter.
However, if multi-cycle requests were encountered by such a system, data would be incorrectly provided to the processor core, forcing retry operations or incorrect transfer of data. For example, a request implicating the highest-latency slice will be granted at the slice arbiter before an immediately subsequent request from the lowest-latency slice. But, the core arbiter will grant the request from the lowest-latency slice first, since the highest-latency slice request will not arrive at the core arbiter until much later. For single-cycle request, the above-described blocking is sufficient to prevent multiplexer contention between such out-of order decisions, but when a multi-cycle request has been granted, the existing blocking scheme is insufficient to avoid contention.
It is therefore desirable to provide an arbitration system and method providing improved multicycle data transfer operation in a distributed arbitration system.