The present invention relates to a method and apparatus for implementing peripheral component interconnect (PCI) combining function for PCI bridges.
When an input/output adapter (IOA) writes into system memory with a packet that is smaller than the system cache line size, two performance problems are created. First the system memory controller must snoop the main processor, and if data resides in its cache, the data must be flushed out to main memory or cast out before the IOA write can occur. Second the system memory controller must issue a read-modify-write. If there is a lot of this type of traffic, then system performance can be greatly degraded. But if a write were aligned to a cache line size where the starting address of the write is a cache line boundary, and the size is an integral number of cache lines, then the write would have a smaller impact on system performance, since no cast out and not read-modify-write would occur. Thus a high bandwidth IOA that always aligns its direct memory accesses (DMAs) cause less degradation to system performance than a lower bandwidth IOA that does not align its DMAs.
Most IOAs do DMA a continuous data stream to system memory but many do so with a very small packet size, for example 4-bytes or 32-bytes, and with a large delay between each packet. The impact of these types of IOAs on system performance, such as on IBM RS6000 and IBM AS/400 system performance, is measurable and a concern. These performance impacts get worst with increasing speed of those system""s main processors.
A peripheral component interconnect (PCI) local bus system often includes a primary 64-bit PCI bus and multiple, such as eight secondary PCI busses. The PCI local bus is a high performance, 32-bit or 64-bit bus with multiplexed address and data lines. The bus is used as an interconnect mechanism between highly integrated peripheral controller components, peripheral add-in boards, and processor and memory systems.
A peripheral component interconnect (PCI) bridge can be used for combining. Combining occurs when sequential memory write transactions with a single data phase or burst and independent of active byte enables are combined into a single PCI bus transaction using linear burst ordering. Under certain conditions, PCI bridges that receive write data may attempt to convert a transaction with a single or multiple data phases into a large transaction to optimize data transfer.
U.S. Pat. No. 5,915,104 issued Jun. 22, 1999 discloses a PCI bridge that acts as an interface between the PCI bus and a packet switched router. Write gathering is used to gather a plurality of write transactions on the PCI bus into write buffers and sent by the bridge as one 128 byte cache line sized transfer to the routing mechanism.
A need exists for an improved method and apparatus for implementing peripheral component interconnect (PCI) combining function for PCI bridges. It is desirable to provide such method and apparatus for implementing peripheral component interconnect (PCI) combining function for PCI bridges that can combine multiple secondary bus packet writes into a single aligned host bus write and that can alleviate the host read-modify-write penalty.
A principal object of the present invention is to provide a method and apparatus for implementing peripheral component interconnect (PCI) combining function for PCI bridges. Other important objects of the present invention are to provide such method and apparatus for implementing peripheral component interconnect (PCI) combining function for PCI bridges substantially without negative effect; and that overcome many of the disadvantages of prior art arrangements.
In brief, a method and apparatus are provided for implementing peripheral component interconnect (PCI) combining function for PCI bridges. A programmable boundary for a combined operation is selected. A write request is received. Responsive to the write request, checking for a combined operation hit is performed. Responsive to an identified combined operation hit, a combined operation is accepted. Checking for the selected programmable boundary for the combined operation is performed. Responsive to identifying the programmable boundary for the combined operation, the combined operation is launched to a destination bus.
In accordance with features of the invention, a programmable timer is identified for the combined operation. Responsive to the programmable timer expiring, the combined operation is launched to a destination bus. The programmable boundary for a combined operation is selected responsive to reading an adapter type and one of combining with a 128-byte boundary, combining with a 256-byte boundary, combining with a 512-byte boundary, or a posted memory write (PMW) is selected.