Peripheral Component Interconnect (PCI) devices have become very important to embedded system architects. PCI allows designers to take advantage of the availability and pricing of chipsets mass-produced for the PC (personal computer) market. The high-performance aspects of this architecture impose demanding hardware requirements for efficient use. Due to the relatively slow nature of accesses to dynamic random-access memory (DRAM), it is usually necessary to implement a temporary storage device--such as a first-in first-out buffer (FIFO)--in silicon to buffer the accesses to system memory. This results in higher system cost due to complexity (increased number of logic gates) and input-output (increased number of device pins) requirements.
Faster DRAM and the use of greater clock speeds is a potential solution to increasing the performance of a microprocessor-based computing system, however, there are limits to the effectiveness of this type of solution. For one thing, in conventional PC systems that use PCI devices, an increase in processing speed at the main CPU (i.e., the microprocessor) and at its RAM and ROM (read only memory) devices does not necessarily help to increase the throughput of the PCI device/main CPU sub-system. This is so because the PCI devices operate at one clock rate in conventional systems, and the CPU and its memory devices operate at a second (typically faster) clock rate. The bus systems used by the PCI devices are isolated from the CPU bus systems (especially the data busses) by a "PCI bridge" circuit.
In conventional circuits that use embedded PCI devices, one common implementation provides a "CPU bus" that connects the CPU, its memory devices, and one side of the PCI bridge circuit. A second bus, the "PCI bus," is used to connect one or more PCI devices to the other side of the PCI bridge circuit. In a second common implementation of an embedded PCI system, a "CPU bus" is provided to connect the CPU to one side of the PCI bridge circuit, and a second bus, the "Memory bus," is provided to connect the memory devices to a second side of the PCI bridge circuit. A third bus, the "PCI bus," is used to connect one or more PCI devices to a third side of the PCI bridge circuit.
An example of a conventional PCI bridge circuit is disclosed in U.S. Pat. No. 5,632,021, which discloses a memory system that connects a primary PCI bus and a secondary PCI bus, but prevents these two PCI buses from having a "livelock" condition. Between the two PCI buses are a pair of PCI bridges, in which one bridge acts only as a target on the primary bus and as a master on the secondary bus, and the second bridge acts as a master on the primary bus and as a target on the secondary bus. A "livelock" condition occurs when the bus master on one of the buses can monopolize the bus to the exclusion of the bus masters on the other bus. This occurs because PCI bridges have a pipeline delay, in which the PCI bridge introduces one or more cycles of delay between the time a piece of data enters the bridge and the time that data emerges from the bridge.
Another conventional PCI bridge circuit is disclosed in U.S. Pat. No. 5,434,996, which describes a circuit within a bus bridge that operates in two (2) clock domains, in which one bus is a CPU bus that operates faster than a PCI bus. The circuit allows data, addresses, or other types of information to be transferred between the first and second clock domains whether or not an internal bus clock is operating in a synchronous or asynchronous mode.
A further PCI multi-bus computer system is disclosed in U.S. Pat. No. 5,581,714 for a logic system that improves bus-to-bus data transfers in a multi-bus computer system. A system bus has a slave device attached, a peripheral bus has a master device attached, and a host bridge connects the two buses. The system bus permits burst read transfers of data stored in the slave device, only if the first address corresponds to a prescribed system bus boundary. The peripheral bus (e.g., a PCI bus) is not subject to address boundary limitations, and permits burst read transfers beginning at any address. The peripheral bus includes a primary PCI bus, which then after running through a secondary PCI bridge, is connected to a secondary PCI bus and devices.
Some of the inherent limitations of the available conventional PCI circuits include (as noted above) (1) a high pin-out requirement and (2) buffering between buses. Assuming in the first instance that both the CPU and system memory have 32-bit data widths, the PCI bridge will require at least 64 pins for the datapath alone. This obviously increases pin count, which in turn increases cost. In the second instance, since the buses run at different speeds or at different bandwidths, it is necessary to buffer data in the form of FIFO's. Generally, the larger the FIFO, the better the performance. This increases design complexity and design size, thereby also increasing system cost.
In addition to the higher costs involved with convention embedded PCI systems, as noted above, such systems will always be limited in performance by lost portions of clock cycles in which major portions of the clock cycle must often be used to "wait" for data information or address information to be set up before being strobed into (or from) the microprocessor, or into (or from) a memory device such as a DRAM integrated circuit. The main reason for this inefficiency is that fact that DRAM chips are asynchronous, which creates the situation related above in which major portions of the clock cycle are spent waiting for data/address information to be set up.
The asynchronous nature of most PC computing systems does not lend itself well when attempting to communicate with a PCI device. This is so because PCI devices use bus timings that have very tight tolerances. In a PCI device, all data/address transfers are done in a single clock cycle, and further, the PCI specification allows only a 10 nsec propagation delay. In view of these requirements, the PCI bridge circuit must essentially convert signals between a relatively "loose" asynchronous memory system (having relatively "long" setup times) and a relatively "tight" synchronous system that is compatible with the PCI specifications. Obviously, a good deal of communications inefficiency will be the result of this type of system.
Since a typical application for a PCI device embedded in a PC system is to act as a network controller (such as an ETHERNET controller), it will be understood that the greater the efficiency in creating higher throughput, the more powerful the network controller. More power means greater commercial advantage for the product that can solve some of the inherent inefficiencies of the conventional PCI bridge systems.
It would be advantageous to eliminate the PCI bridge in some computer systems so that the data transfers between memory devices and PCI devices could be accomplished without the timing losses inherent in asynchronous DRAM chips used in conventional PC's. Synchronous DRAM (SDRAM), which is a higher-performance type of memory device, provides a means of solving this limitation.
An example of a synchronous DRAM integrated circuit is disclosed in U.S. Pat. No. 5,636,173 which includes two (2) banks of memory arrays. A controller can initiate in the first system clock cycle an active command to control an active operation on the first bank memory array, and at a second clock cycle can initiate a read or write command to transfer from or to the first bank memory array. In the second clock cycle, the controller also can initiate a "command" to control a precharge operation on the second bank memory array. To hide the precharging, the precharge command is issued to the bank memory array not being accessed while the bank memory array being accessed is in a burst mode.
An example of a conventional controller used with a synchronous DRAM is disclosed in U.S. Pat. No. 5,630,096, to maximize the throughput of memory requests. The controller maintains the spacing between commands to conform with the specifications for the synchronous DRAM while preventing gaps from occurring in the data slots. The controller allows memory requests and commands to be issued out of order so that the throughput may be maximized by overlapping required operations that do not specifically involve a data transfer. The controller also schedules memory request commands as closely together as possible to maximize the throughput of the memory requests within the timing constraints of the synchronous DRAM. The controller sorts memory requests that are received and, to maximize the throughput, the multiple memory requests are prioritized in a different order than the requests were received. In addition, memory requests are tagged to indicate a sending order, and conflicting memory requests are arbitrated, then queued and the arbitration process is decoded to simultaneously update schedule constraints.
A further example of a synchronous DRAM control circuit for a memory module is disclosed in U.S. Pat. No. 5,666,321, in which the control circuitry allows the memory module to operate in a mode that is partially asynchronous. Address transition detection is used to begin decoding the column-address immediately after a new column-address is present on the address bus lines, without waiting for the column-address strobe signal to synchronize with the rising or failing edge of the synchronizing clock signal. In addition, latching circuitry can be used in which a new column-address may be decoded and held without a buffer until the column-address strobe signal notifies the circuitry that the column-address is correct and is to be input into the microprocessor. This improves the access time of read and write operations in synchronous DRAM memory by up to a complete clock cycle.