The present invention relates to intelligent data bus interface, and more particularly, to an intelligent data bus interface having a multi-port memory and a data processor.
A typical interface between the bus of a host computer and the bus of a slave computer or I/O device includes a dual-port memory having two independent bi-directional ports. The dual-port memory generally resides in an overlapping address space of both computers. Data is transferred between the computers by having one computer write to that computer's address space associated with the dual-port memory, and by having the other computer subsequently read the data from the other computer's address space associated with the dual-port memory. The dual port memory is particularly advantageous as an interface between a host bus and a slave bus which are operating at different data rates. The interface requires interrupt driven access to the central processing units (CPU) of the host computer and the slave computer for protocol, control, and for data processing functions. Each CPU is burdened by the interface interrupts which can demand a significant percentage of the CPU's processing power.
A typical intelligent interface has an input/output processor (IOP) that includes a dedicated microprocessor which allows the host computer's CPU to be relieved of many input/output (I/O) related tasks. Accordingly, higher I/O performance may be achieved while lowering the processing burden on the host CPU. In addition, the IOP may perform higher levels of the I/O protocol, carry out data conversions such as encryption/decryption, and execute intelligent runtime optimizations, like read-ahead caching and write-data merging.
An example of an existing intelligent data bus interface is disclosed as a network bridge 100 in U.S. Pat. No. 5,130,981 to Murphy. The term "intelligent" arises from a dedicated system processor 101 that is included in the network bridge. In the Murphy network bridge, a single-port random access memory (RAM) 102 is used to store data packets received by the network bridge from a first network 105 and from a second network 106 through first and second DMA controllers 103, 104, respectively. The system processor and the DMA controllers have access to the data packets stored in the RAM through a 3-port RAM interface which prevents the system processor or the DMA controllers from simultaneously accessing the single-port RAM. Ideally, the 3-port RAM interface 107 allocates the access to the single-port RAM so that the processor and the two DMA controllers have equal access priority to the single-port RAM. However, to provide equal access priority, the access cycle time of the RAM must be three times the access cycle time of the processor or the DMA controllers thus limiting the maximum bandwidth of the network bridge to about one-third of the RAM's bandwidth. The 3-port RAM interface gates access to the single-port RAM in a way that prevents simultaneous and asynchronous access by the processor and the DMA controllers to the single-port memory. Further, all data transfers and all requests between the two networks must occur through the RAM in the Murphy patent.
Generally, a host computer's and a slave computer's operating system must implement hardware specific functions to use an existing data bus or network interface. Currently, standardization efforts for intelligent data bus interfaces are being directed toward de-coupling the operating system from the specific intelligent interface hardware by defining a standard intelligent interface protocol that is independent from the operating system. Therefore, intelligent interface hardware that implements the standard protocol is compliant with all operating systems supporting the standard protocol. Exemplary standards are the Intelligent I/O (I.sub.2 O) architecture, the Intel/Microsoft Virtual Interface (VI) Architecture, the Uniform Driver Interface, and the IEEE SCI Physical Layer API. The I.sub.2 O standard specifically defines an intelligent I/O hardware architecture, byte transport protocols between a host CPU and an IOP, the transport driver interfaces, the message protocol, and the IOP initialization and configuration.
A currently available intelligent IOP 2, such as an Intel i960RP, is shown in FIG. 1 in a simplified form. The Intel IOP is positioned to operate within the I.sub.2 O standard. The IOP includes a microprocessor (processor) 4 having an internal memory 6, a memory bus interface unit (MU) 8, a local bus interface unit (BIU) 12, and two direct memory access (DMA) interfaces 14, 18. The processor, MIU, BIU and DMA interfaces are connected together by an internal bus 10. The IOP's large number of I/O interfaces allows it to act as an intelligent I/O bridge, as well as to perform initialization and control functions for the bus interface. The IOP's primary DMA interface 14 is typically connected to a host CPU (not shown) via the host's local peripheral component interconnect (PCI) bus 16. A secondary DMA interface 18 is connected to the network hardware (not shown) via a second PCI bus 20. A PCI-PCI bridge 22 allows DMA data exchange between the PCI buses 16, 20 without using the IOP or reducing the throughput at either bus. The IOP's microprocessor uses the internal bus 10 to connect the MIU 8, to the BIU 12, and to the two DMA interfaces 14, 18.
In a typical I/O operation, the IOP 2 receives a request that is posted to a specific address in its on-chip internal memory 6. The processor 4 subsequently decodes the request and responds by configuring an appropriate network interface (not shown) using the PCI bus 20. The network interface performs the request and copies the resulting data to/from the host computer's memory, using the PCI-PCI bridge 22. After completion of the DMA transaction, the IOP 2 receives an appropriate interrupt, which triggers a completion operation for the request.
Performance limitations resulting from the architecture of the IOP 2 are evident when additional I/O data processing is required, such as data encryption/decryption, packet-by-packet packet flow control, or implementation of higher protocol layers in the intelligent network interface. For such I/O data processing, the processor 4 must have direct access to the data stream. The CPU's access to the data stream can be done using several techniques, of which two are outlined below.
In a first technique, the processor 4 performs programmed I/O reads directly from internal elasticity buffers of the network interface using the PCI bus 20 processes the data internally and, using the host PCI bus 16, writes the data directly to the target memory in the primary PCI address space. Unfortunately, during such programmed I/O, at either of the two DMA interfaces 14, 18, any access latency may substantially slow the processor 4 thus reducing its processing efficiency. Further, the total available bandwidth through the IOP is limited to one half the bandwidth of the internal bus 10 minus any processor program code fetches (program cache misses) and access latencies or stalls at either of the two DMA interfaces 14, 18.
In a second technique, the PCI bus access latencies may be avoided by utilizing DMA engines in the two interfaces 14, 18 for moving data to and from the local memory (not shown) via the MIU 8 through a local memory bus 24. The local memory acts as an elasticity buffer for any I/O processing. The high-speed internal memory 6 also may be an elasticity buffer for avoiding PCI bus latencies. The internal memory 6 is typically small, e.g., 1 Kbyte, which requires a tight coupling between the sender and receiver of data packets. Using the internal memory 6, the total available bandwidth is limited to one half the bandwidth of bus 10, less any code fetches by the processor 4. The access latency to either of the PCI buses is amortized over larger data bursts and can be neglected. However, in the event that the internal memory 6 is too small and the local memory (not shown) is used, the total data throughput is limited to 1/4 the bandwidth of internal bus 10, minus the bandwidth required for any code fetches.
As the bandwidth limitations of two techniques described above indicate, the architecture of the existing IOP 2 is advantageous if the IOP is performing schedule and control operations. However, if the IOP is required to access a high-speed data stream, its performance is more limited due to the inherent bandwidth constraints of its architecture.
Accordingly, there exists a definite need for an intelligent data bus interface having an internal processor that can access and operate on a high-speed data stream without unduly affecting the stream's data rate through the interface. Further, there exists a need for an intelligent data bus interface that can bridge data across the interface without requiring the interface to have a super-high speed internal data bus that must operate at a bandwidth that is many times larger than the bandwidth of a high-speed data stream. The present invention satisfies these needs and provides further related advantages.