1. Technical Field
The present invention relates generally to computer systems and in particular to a computer system designed as a system on a chip (SoC). Still more particularly, the present invention relates to a method and system for providing a SoC with bus architecture that supports sequences with varying latency and/or frequency requirements.
2. Description of the Related Art
The computer industry has made significant developments in integrated circuit (IC) technology in recent years. For example, ASIC (application specific integrated circuit) technology has evolved from a chip-set philosophy to an embedded core based system-on-a-chip (SoC) concept. The system-on-a-chip concept refers to a system in which, ideally, all the necessary integrated circuits are fabricated on a single die or substrate. An SoC IC includes various reusable functional blocks, such as microprocessors, interfaces (e.g., external bus interface), memory arrays, and DSPs (digital signal processors). Such pre-designed functional blocks are commonly called xe2x80x9ccoresxe2x80x9d.
With a SoC, processed requests are sent from a core referred to as an initiator to a target (which may also be a core). An initiator (or master or busmaster as it is sometimes called) is any device capable of generating a request and placing that request on the bus to be transmitted to a target. Thus, for example, either a processor or DMA controller may be an initiator. Targets (or slaves) are the receiving component that receives the initiator-issued requests and responds according to set protocols.
In order to complete the connections between initiators and targets, the SoC includes an on-chip bus utilized to connect multiple initiators and targets. The system bus consists of an interface to the initiators and a separate interface to the targets and logic between the interfaces. The logic between the interfaces is called a xe2x80x9cbus controllerxe2x80x9d. This configuration is typical among system-on-a-chip (SoC) buses, where all the initiators, targets and the bus controller are on the same chip (die).
One example of the bus utilized by SoC computers systems is the CoreConnect(trademark) processor local bus (PLB). (CoreConnect(trademark) is a registered trademark of International Business Machines). In an SoC with a PLB architecture, each device attaches to a central resource called the xe2x80x9cPLB Macroxe2x80x9d. The xe2x80x9cPLB Macroxe2x80x9d is a block of logic that acts as the bus controller, interconnecting all the devices (including initiators and targets) of the SoC. PLB Macro primarily includes arbitration function, routing logic, buffering and registering logic. The devices communicate over the bus via a (PLB) protocol in a synchronous manner. The protocol includes rules that control how transmission processes are to be completed, including, for example, the number of clocks (system clock cycles) taken to perform certain sequences. Among these sequences are (1) the time from request at the initiating device to snoop result at the initiating device, and (2) the time from read data at the source device (the target) to read data at the destination device (the initiator), etc.
SoC fabrication involves various design considerations that enables differentiation among the resulting chips. Each chip is designed/fabricated with a set of devices, which may be different from (or similar to) the devices utilized by another chip. When each chip has a unique set of devices, the resulting chip/die sizes are different. Furthermore, chips may be built from a variety of chip technologies, which have different timing characteristics.
The time for a signal to propagate across a chip depends on the xe2x80x9cdistancexe2x80x9d the signal must travel and the characteristics of the chip technology. As utilized herein, the term xe2x80x9cdistancexe2x80x9d is a generalized term describing the combined effects of actual wire distance, wire dimensions, net capacitance, gate characteristics, etc. As a consequence, the amount of time for a signal to propagate from one device to another (including the time to propagate between a device and the PLB Macro) differs significantly from chip to chip. These inevitable variations in xe2x80x9cdistancexe2x80x9d between devices means that (1) running the bus at a single frequency and (2) operating the protocol sequences at a single latency is not optimal for a variety of chips.
Currently, the simplest method of addressing the above problem is to define a protocol with a fixed set of latencies and then adjust the frequency based on the distances between devices. In this method the various sequences that make up the protocol are actually run at more than one latency. This method is utilized in CoreConnect(trademark) PLB3 and PLB4. However several drawbacks are seen with this method, including:
(1) the devices must be capable of operating over a variety of frequencies. This is often problematic, particularly for devices that attach to other off-chip devices that operate at a fixed frequency;
(2) at lower frequencies, bandwidth and latency are degraded, which results in a loss of performance. The latency loss is the result of sequences taking a fixed number of clocks (ticks or cycles), while the clock ticks are becoming longer; and
(3) the system (collection of devices) is xe2x80x9coptimizedxe2x80x9d for the longest (slowest) path among the devices. Therefore, devices cannot operate at a higher frequency.
A more sophisticated method of addressing the problem involves defining the bus protocol such that protocol sequences are allowed to take a range of number of clock ticks (latencies). During chip integration (i.e., the design process of connecting all the devices on the die), the maximum distances between devices is determined, and the appropriate latencies are set for the corresponding paths.
Often, this technique is utilized such that the latency for all devices is set based on the longest path between any two devices. Thus, even nearby devices utilize the latency associated with the longest path. The CoreConnect PLB3 and PLB4 buses also utilize this technique for the master-request-to-slave-request path. However, this technique is also not optimal for many chips. Paths that are long are set to take multiple clocks for propagation, and this results in the following drawbacks:
(1) bandwidth is degraded because a new sequence cannot begin on each clock;
(2) timing analysis is more difficult to perform when paths require more than one clock for propagation. This is because timing analysis software tools require the operator to identify and specify the number of clocks associated with any path that requires more than one clock, since the default number of clocks is one; and
(3) if all paths are set to a latency based on the single longest path, then devices that are close to one another cannot take advantage of their proximity.
The present invention recognizes the flaws in the two design methods described above and realizes that it would be desirable to provide a SoC designed to optimize the transmission of signals on the bus given the multiplicity of frequencies and latencies of propagation. The invention recognizes that it would be further desirable to provides this feature without requiring degradation in either timing or other parameter of SoC bus operation. These and other benefits are provided by the invention described herein.
Disclosed is a method of designing a system on a chip (SoC) to operate with varying latencies and frequencies. A layout of the chip is designed with specific placement of devices, including a bus controller, initiator, and target devices. The time for a signal to propagate from a source device to a destination device is determined relative to a default propagation time. A pipeline stage is then inserted into a bus path between said source device and destination device for each additional time the signal takes to propagate. Each device (i.e., initiators, targets, and bus controller) is designed with logic to control a protocol that functions with a variety of response latencies. With the additional logic, the devices do not need to be changed when pipeline stages are inserted in the various paths.
In the described embodiment, the bus controller is a PLB5 macro with associated PLB5 operating protocol and the default propagation time is one clock cycle. Registers are utilized as the pipeline stages that are inserted within the paths. One aspect of the design involves an algorithm that first identifies a signal that does not meet a default timing requirement of the SoC operating parameters. That signal has a corresponding group of related signals to complete an operation and the other signals within the group are identified as well. Pipeline stages are inserted as necessary in the paths of signals within the group. In some instances, a pipeline stage is also inserted within the PLB5 Macro.