There has been a rapid increase in the clock speed of microprocessors. Presently available examples, such as the PowerPC 750 processor available from International Business Machines Corporation, can operate at clock speeds of 400MHz or more. As the speed of microprocessors has increased, the price of such microprocessors has decreased. It is now practical to embed high performance microprocessors in controller systems such as disk controller systems.
Such controller systems typically comprise a microprocessor or microcontroller connected to a memory subsystem and one or more controlled devices via an external bus system. The external buses usually operate at slower speeds than the processor. Therefore, it is difficult to operate all elements of the controllers system at the same high frequency. As more devices are attached to the bus, the load imposed on bus drivers of the external bus system is correspondingly increased. The increased load leads to a corresponding reduction in clock speed.
In some controller systems, the external bus system comprises a hierarchy of buses interconnected by bridge devices. The bridge devices isolate signals carried by the buses in the hierarchy. Each bus in the hierarchy interconnects a different group of devices This arrangement reduces the effective load on the external bus system. A reasonable clock speed can thus be maintained. However, the intermediate bridge devices introduce a delay or latency to memory accesses. These latencies reduce the processing speed of the processor, particularly in connection with communications between the processor and devices connected to extremes of the bus system. Further, there is usually arbitration associated with accesses to each bus in the bus system. Bus arbitration adds further latency. Further still, some buses, such as PCI buses, operate asynchronously relative to the devices they interconnect. Such asynchronous operation leads to introduction of further latency because additional clock cycles are used in providing valid data signals on the buses.
When the processor issues a write command, it is possible to reduce delay introduced by the bridge devices via buffers in the bridge devices. The buffers enable the waiting or “stall” time in the processor to be reduced to the time taken to send the write command through the bus which is directly connected to the processor. For example, caches in the memory subsystem usually send or “post” data cache line flush commands to the memory subsystem thereby enabling the processor to continue operation even as data is transferred between successive layers of the memory subsystem. Such “posted write” systems are well known.
For read commands however, there is no such satisfactory solution. In general, a read command incurs all the delays introduced by arbitration and synchronization at each bridge device in the bus system. Also, prior to execution of a read command, all posted write commands within each bridge device should be completed in case the region of memory being read is affected.
In the PCI-X bus system, strict ordering of read and posted write commands can be relaxed to reduce latency in execution of read commands. However, this can adversely affect execution of program code by the processor. Also, the latency inherent in read commands is still substantial.
Further difficulties arise if accesses through the bridge devices become too frequent and begin to approach the bandwidth of the intervening buses in the bus system. This situation arise when, for example, many short random accesses are made to remote memory locations. This problem can be alleviated by providing sufficient memory as close as possible to the processor. For example, the system may be provided with large L1 and L2 caches. However, in many controller systems, it is also desirable to place at least some memory close to other devices of the controller system, such as device interfaces. Such interfaces are also intolerant of latencies. Thus, if memory is too remote from these devices, performance will be adversely affected.
In a typical controller system, data which is written by a device interface and read by the processor is posted through bridge devices to memory located close to the processor. Memory which is written to by the processor, and read by the device interface chip is located close to the device interface chip. The amount of data that can be transferred between the processor and the device interface is limited to the capacity of the intervening bus system. Also, at least some memory must by both read from and written to by one or both of the processor and the device interface chip. Furthermore, there are practical limitations in terms of both addressability and cost on the amount of memory that can be assigned to the processor. An embedded disk controller typically requires a cache which is too large for direct connection to a fast processor at a reasonable cost. Conventionally, therefore, such caches have been attached to the processor via a memory controller remote from the processor. A problem associated with this arrangement is that accesses to the remote memory are very slow, typically taking between hundreds and thousands of clock cycles of the processor. Such times can exceed the execution time or code waiting to be executed by the processor. The imbalance between memory access time and processor cycle time increases as microprocessors get faster.