The present invention relates in general to data processing systems, and in particular, to bus systems with independent read and write data buses.
Recent advances in silicon densities now allow for the integration of numerous functions onto a single silicon chip. With this increased density, peripherals formally attached to the processor at the card level are now integrated onto the same die as the processor. As a result, chip designers must now address issues traditionally handled by the system designer. In particular, the on-chip buses used in such system-on-a-chip (SOC) designs must be sufficiently flexible and robust in order to support a wide variety of embedded system needs.
The IBM Blue logic core program, for example, provides the framework to efficiently realize complex system-on-a-chip designs. Typically, an SOC contains numerous functional blocks representing a very large number of logic gates. Designs such as these are best realized through a macro-based approach. Macro-based designs provide numerous benefits during logic entry and verification, but the ability to reuse intellectual property is often the most significant benefit. From generic serial ports to complex memory controllers and processor cores, each SOC generally requires the use of common macros.
Many single chip solutions used in applications today are designed as custom chips, each with their own internal architecture. Logical units within such a chip are often difficult to extract and reuse in different applications. As a result, many times the same function is redesigned from one application to another. Promoting reuse by ensuring macro interconnectivity is accomplished by using common buses for to inter-macro communications. The IBM CoreConnect architecture, for example, provides three buses for interconnecting cores, library macros, and custom logic. These buses are the Processor Local Bus (PLB), On-chip Peripheral Bus (OPB) and Device Control Register (DCR) Bus. Other chip vendors may have similar SOC core architectures, for example the Advanced Microcontroller Bus Architecture (AMBA) commercially available from ARM Ltd.
FIG. 1 illustrates how the prior art CoreConnect architecture is used to interconnect macros in the PowerPC 405 GP embedded controller. High-performance, high bandwidth blocks such as the Power PC 405 CPU core, PCI bridge and SDRAM controller reside on the PLB 102, while the OPB 101 hosts lower data rate peripherals. The daisy chain DCR bus 104 provides a relatively low-speed data path for passing configuration and status information between the PowerPC 405 CPU core and other on-chip macros. A PLB Arbiter 103 would handle contention between devices on PLB 102.
The CoreConnect architecture shares many similarities with other advanced bus architecture in that they both support data widths of 32 bits and higher, utilize separate read and write data paths and allow multiple masters. For example, the CoreConnect architecture and AMBA 2.0 now both provide high-performance features including pipelining, split transactions and burst transfers. Many custom designs utilizing the high-performance features of the CoreConnect architecture are available in the marketplace today.
In most SOC designs the CPU is a key element of the chip. Modern RISC based CPUs often require a large number of memory read operations to run a particular application. This is caused by several factors. One factor is that complex operations are made up of long streams of simple instructions. These instructions sometimes may exist in a local cache. Often the relatively small size of the cache or the non-locality of reference code will force misses or line memory read operations. Several newer CPUs are super-scalar and have multiple execution pipelines which can multiply the number of read transfers required. Since the capability exists to manufacture so many transistors on a chip, many other complicated functions such as graphics, communications, and DMA controllers may also be integral to the chip. All these factors contribute to the need for a bus structure which may provide large amounts of memory read traffic. Depending on a particular application, read data bus traffic may be two to three times more than write data bus traffic. In an SOC design, the utilization of the on-chip bus structure is an important consideration. Efficient use of the bus produces better system throughput and response maps to real-time applications. An implementation of a high performance on-chip bus architecture is the IBM CoreConnect(trademark) Processor Local Bus (PLB). This bus structure contains separate read and write data busses for simultaneous read and write transfers. The PLB bus structure allows multiple slave devices to communicate with multiple master devices under the control of a central bus arbitration unit. The arbiter grants requesting masters control of the bus to communicate with the various slaves. There are, as stated above, separate read and write data and control buses coupled to a common address and transfer qualifier bus. This arrangement allows for read and write operations to be simultaneously performed or xe2x80x9coverlappedxe2x80x9d. In a system running an application with two to three times more reads than writes, the write data bus has the potential to be idle for a large percentage of the time. There is clearly a need to have a bus architecture which maintains the normal read and write simultaneous overlapped transfers while offering a dynamic option to further utilize idle bus time when an unbalance between read and write traffic results when executing a specific application.
In a bus structure that has independent read and write data busses, the bus arbiter determines when there is an unbalance between the read and write traffic. An added auxiliary read data bus is added to the slave devices coupled to a bus arbiter. When the bus arbiter determines there is an unbalance in the read traffic and additional read bandwidth is needed, the bus arbiter asserts an auxiliary read command to the slave devices. A slave device claiming the auxiliary read, sends its read data to the bus arbiter on the added auxiliary read data bus. The slave device claiming the auxiliary read sends one or more write data acknowledges to the bus arbiter, depending on the transfer size. The bus arbiter, in response to the auxiliary read data acknowledge, routs the auxiliary read data to the appropriate master and conveys the slave write data acknowledge to the master read data acknowledge. By routing the write data acknowledge to the read data acknowledge of the master the bus system appears to be executing a normal read data transfer. If pending priority requests indicate that the bus needs to revert to its normal functionality then the auxiliary commands may be de-asserted and the normal read and write independent bus structure may be again established. For the allocated period of time the independent read and write data buses appear (from a bandwidth perspective) as dual read data busses. Other embodiments use tri-state bi-directional busses where a separate auxiliary read data bus does not need to be added. In this embodiment, a bi-directional write data bus has the functionality of a uni-directional write data bus and an added uni-directional auxiliary read data bus.
The foregoing has outlined rather broadly the features and technical advantages of the present invention in order that the detailed description of the invention that follows may be better understood. Additional features and advantages of the invention will be described hereinafter which form the subject of the claims of the invention.