The present invention relates generally to computer systems, and more particularly to communication that takes place on a device bus between components in a computer system.
Computer systems utilize internal busses to provide communication between various components of the computer system. For example, in personal computer systems such as IBM-compatible Pentium- or P6-class computers, a main CPU bus is provided to communicate control, data, and address information from the CPU to memory and other components, and vice-versa. Likewise, a device bus is typically provided in personal computers to allow add-on devices, such as video cards or other peripherals, to communicate with the CPU, memory, other add-on devices, and other components of the computer system.
One popular device bus standard is the Peripheral Component Interconnect (PCI) bus by Intel Corporation, which is used by several IBM-compatible personal computers. The PCI bus allows add-on interface cards such as video cards, controller cards, input/output (I/O) cards, modems, sound cards, and other devices to be connected to the computer system and communicate with the CPU, memory, and other components. The PCI bus is a much more efficient bus than previous devices, such as ISA, EISA, and VESA, and can provide a bandwidth of 133 megabytes/sec throughput for fast communication. In addition, fully detailed specifications are provided for the PCI bus as an industry-wide standard to allow device manufacturers to take full advantage of PCI bus capabilities.
The CPU may often communicate with PCI devices such as a video card. In addition, PCI devices may often access components such as memory, which can be shared with other components of the computer system such as the CPU. Since the CPU and PCI devices often process data at different rates, the CPU and PCI devices need to be synchronized. To facilitate fast and efficient synchonization between the PCI devices and the CPU, a write buffer is often implemented.
A typical implementation of a write buffer in a PCI computer system 10 is shown in FIG. 1. CPU 100 is coupled to a main bus 102 and a CPU-PCI bridge 104. The main bus is used for communication between the CPU 100 and other well-known components provided in the computer system (not shown). The CPU-PCI bridge 104 directs the communication of data between the CPU 100 and PCI devices 108 coupled to a PCI bus 106. CPU-PCI bridge 104 is also coupled to devices such as shared system memory 110 to allow both CPU 100 and PCI devices 108 to access memory 110.
PCI devices 108 send and receive data via PCI bus 106 and may include one or more PCI masters 112, PCI slaves 114 and an ISA controller 116. PCI masters 112 are devices that are able to send and receive data from other components in the computer system and gain control of the PCI bus to enable such communication. PCI masters 112 may request data, for example, from shared memory 110 or PCI slaves 114. The PCI slaves 114 are devices that typically store data for retrieval by PCI masters 112 or CPU 100. Often, a PCI master and a PCI slave are both included in a PCI device. For example, a typical PCI device is a video card that may include a PCI master and a PCI slave. The PCI master portion of the video card requests data from video memory on the card and data from shared memory 110 which was stored by the CPU 100. The PCI slave portion of the card allows other PCI masters to gain access to the video memory on the video card. PCI master 112a and PCI slave 114a, shown in FIG. 1, illustrate such a configuration.
ISA controller 116 can also be provided as a PCI device (e.g., having both a master and slave in the same device) in the system 10 to allow compatibility with ISA devices. ISA controller 116 acts as a bridge between the PCI bus 106 and an ISA bus 120. Other components, such as memory 118, can be coupled to the ISA bus and accessible via the ISA bus. Since ISA bus devices are still widely available, many systems include an ISA controller 116.
CPU-PCI bridge 104 facilitates more efficient communication between the PCI bus and other components and includes a host slave 122, a write buffer 124, an arbiter 126, a bridge PCI master 127, and a bridge PCI slave 128. Host slave 122 decodes all data from the CPU 100 so that the data can be routed to the proper destination. The host slave 122 temporarily stores data that is to be sent to PCI devices 108 into locations 123 in write buffer 124 until the data can be sent to the PCI devices. Since the data transfer rates of PCI devices 108 and CPU 100 are typically different, the temporary storage of data frees up the CPU for other communications, thus providing a much faster implementation than if no write buffer 124 were used. The data in the write buffer 124 is sent out to PCI devices when synchronization permits by bridge PCI master 127, which gains control of the PCI bus and "flushes" the write buffer. If the write buffer becomes full, the CPU is delayed from writing to the write buffer until a location in the write buffer becomes available. When data is sent out on the PCI bus, all PCI devices 108 see the data. Since the data includes an identification of the PCI device that the data is intended for, the correct PCI device 108 will respond and receive the data. In addition, arbiter 126 arbitrates PCI bus access requests from the CPU 100 and PCI devices 108. Bridge slave 128 allows the CPU-PCI bridge 104 to function as a slave when a PCI master 112 attempts to access shared memory 110.
A problem with the bus configuration disclosed in FIG. 1 occurs with respect to write coherency. The term "coherency", as used herein, refers to the order of data written to the various devices, which is important to maintain so that the correct data is received at the proper destinations. When the CPU writes data into shared memory 110, there is the possibility that the ordering of written data will not be maintained. For example, the CPU first writes data into a location 123 of write buffer 124, where the data is intended to be provided to PCI master 112a. The CPU then writes different data into a location 113 of shared memory 110. Next, PCI master 112a attempts to read the data from location 113 of shared memory 110. However, PCI master 112a was intended by the CPU to receive the data stored in location 123 of write buffer 124 before the PCI master accessed the shared memory 110, e.g., the data in location 123 may have changed its condition or value for the PCI master if PCI master 112a accesses shared memory first. Thus, the coherency or "strong write ordering" of the CPU has been violated in this example when PCI master 112a is allowed to access the shared memory data before the write buffer data.
In another example, the CPU writes data in write buffer 124 that is intended for a PCI slave 114. The CPU then sets data such as a flag in location 113 in shared memory 110 to indicate completion of that write operation. A PCI master 112 then gains control of the PCI bus 106 before the data in the write buffer 124 is flushed to PCI slave 114. If the PCI master 112 is allowed to access shared memory 110, the PCI master will see the state of flag 113 and get a false indication of the state of the memory locations in PCI slave 114.
To compensate for this problem, prior art devices "flush" the write buffer before each time the PCI master is allowed to access the shared memory 110. The flushing procedure entails refusing shared memory access to the PCI master and regaining control of the PCI bus. All data in write buffer 124 is then automatically sent out to the PCI bus to any intended PCI devices. The PCI master must then "retry" its request to gain access to shared memory. This process makes sure that PCI devices 108 always receive their intended data from write buffer 124 before a PCI master may access shared memory 110, and maintains the intended write ordering.
This prior art automatic flushing technique is effective at reducing coherency problems with the bus system 10. However, it is very inefficient. All the data in the write buffer is indiscriminately flushed whenever a PCI master accesses the main memory, whether or not the data actually needs to be flushed. The data may not need to be flushed, however, if the CPU has not written or changed any data in the shared memory, or if the CPU has not previously written to write buffer 124. The prior art technique thus can cause significant and unnecessary delays in data transactions, since the PCI bus must be reserved to flush all the write buffer data before a PCI master can access the shared memory through the PCI bus, and the PCI master must retry to gain access to shared memory. These delays can even become more significant for applications such as multimedia, in which the CPU streams write data to a graphics frame buffer simultaneously with a PCI master streaming read data from DRAM, since transactions often have to be retried and the write buffer has to be disabled.
What is needed therefore is a method and apparatus that makes more efficent use of a device bus, such as the PCI bus, during memory accesses while maintaining the coherency of previous memory writes.