1. Field of the Invention
This invention relates to digital data communication, and more particularly to a system and method for coordinating the transfer of digital data between functional modules in a computer system.
2. Description of the Related Art
In an advanced multimedia PC system, the system memory is a central resource. Unlike PC systems of a previous generation, which are typically designed around the central processing unit (CPU), the memory system in such an advanced multimedia system is much more heavily utilized than before. Typically, the various blocks or subsystems of the system share a communication link, or bus, for exchanging data. However, at higher processing speeds and system complexities, data transfer between subsystems (e.g., I/O devices) or between a subsystem and the main memory remains a performance bottleneck. Maximizing bus speed to accommodate the demands of higher processing speeds and interconnecting large numbers of I/O devices require higher bus availability and bandwidth. A fast response time for I/O operations can be achieved by minimizing bus access time through streamlining the communication path. The bus bandwidth can be increased using buffering and by transferring larger units of data, at the expense of latency. However, bus speed improvement is not straight-forward, since modules of a multimedia system typically include subsystems of widely varying latencies and data transfer rates.
One method for increasing the effective bus bandwidth shared by multiple devices utilizes a split transaction bus, which is released by a subsystem as soon as possible, even when only a portion of a transaction is complete. In a split transaction bus, however, the bus is released to allow use by another requester. However, once the bus is released, additional transactional overhead is incurred by the bus master to re-acquire the bus. For example, in a split transaction protocol, to resume an interrupted bus transaction, a memory system is required to signal the requester to indicate readiness. Thus, in such a system, additional transaction overhead is required to communicate the requester""s identity to the memory system. Also, a split transaction protocol is typically more expensive to implement, primarily because of the need to keep track of the parties of a bus transaction.
One improvement of bus access provides additional bus masters into the system. A master module initiates and controls all bus requests. In a system having a single bus master (e.g., a processor bus is typically controlled by a CPU), all requests are initiated and controlled by the single bus master, thus requiring involvement by the bus master in every bus transaction. The overhead cost thus incurred can be substantial. For example, a single sector read from a disk may require the processor to intervene many times.
Faster and higher bandwidth subsystems cannot require the processor to intervene every bus transaction. Thus, an alternative architectural scheme using multiple bus masters is developed. With multiple bus masters, a bus arbiter is necessary. In a bus arbitration scheme, an I/O device signals a bus request and carries out a bus transaction only when the bus request is granted. One problem with a such a scheme results from the arbiter becoming a bottleneck for bus usage. In particular, where an arbiter receives multiple request lines, the transactional overhead in choosing among independent devices requesting bus access and notifying the selected device create delays.
Accordingly, there is a need for a bus system and architecture by which both bus access and effective bus bandwidth are enhanced. There is also a need for a bus arbitration scheme that increases the effective bandwidth of a bus without incurring additional latencies resulting from additional transactional overhead of a bus arbitration scheme and the possibility of the bus arbiter becoming the I/O roadblock. There is also a need for a bus architecture that provides enhanced bus access to I/O devices without incurring additional arbitration-related latencies.
This invention provides a memory-centric, modularized single-chip processor system, and includes an on-chip split-transaction bus with independent address and data lines. A centralized bus arbiter module uses REQ/GNT protocols to independently arbitrate concurrent ownership of a data bus. Multiple concurrent xe2x80x9cvirtualxe2x80x9d channels are defined by the arbiter, each virtual channel being owned by a master-slave pair. Concurrent ownership by multiple master/slave pairs results in high utilization of a single bus, and eliminates re-arbitration delays.
The inter-module digital signal communication system of this invention includes a bus having separate data lines and address lines. Also, the handshaking transactions among the master, slave, and arbiter to determine bus ownership are de-coupled from data transfer operations on the data bus, thus allowing any access delays over the address lines to overlap with data transfer operations, resulting in high utilization of the available data bus bandwidth. Both the address bus and the data bus may be any width. In one embodiment, the address bus is 32 bits wide, and the data bus either 64 bits or 128 bits wide.
In another embodiment, multiple data buses may be employed. In a memory system which supports burst access, the present invention reduces the number of master/slave handshaking transactions. Accordingly, the architecture of this invention is scaleable to accommodate multiple data buses of any width.
Modules connected to the bus include adaptable interface logic circuits and drivers. These interface circuits permit both on-chip and external modules (i.e., off-chip modules) to utilize the virtual channel capabilities of the architecture of the system of this invention. The logic circuits in a master or slave module recognize and store their virtual channel assignments, and enable the corresponding interface drivers when data transfer access to the bus is granted to the assigned virtual channel.
In accordance with one aspect of the system of this invention, a bus arbiter assigns a virtual channel to each master/slave pair requesting the data bus for data transfer between the master module and a slave module. Each virtual channel represents a timeslice on the bus and is owned by a separate master/slave pair, thereby permitting multiple master/slave pairs to have concurrent ownership of the singular data bus.
Once the bus arbiter grants data access of the bus to a virtual channel, the arbiter asserts a xe2x80x9cchannel activexe2x80x9d signal to initiate actual data transfers between the master/slave pair assigned to that virtual channel. The arbiter multiplexes between virtual channels based on the readiness of master/slave pairs and on the pre-assigned priority of each master.
Concurrent ownership of the data bus by multiple master/slave pairs advantageously enhances bus accessibility over conventional split-transaction bus protocols since the transactional overhead associated with bus re-acquisition protocols between a master/slave pair is eliminated. Since each channel, hence each master/slave pair, has its own unique channel active signal, data transfer between the master/slave pair commences immediately upon the arbiter asserting the appropriate xe2x80x9cchannel activexe2x80x9d signal. Initial acquisition, or re-acquisition of the data bus is accomplished without the handshaking protocols associated with a conventional split transaction bus each time the data bus is acquired. As a result, the virtual channel architecture of this invention permits virtually simultaneous parallel transfers on a single data bus, or on multiple data buses, and allows for maximum bus accessibility and bandwidth utilization of the memory resource.
In accordance with another aspect of the device of this invention, various priority protocols may be used to determine which concurrent owner is granted access to the bus. Concurrent data transfers and maximum utilization of available data bus bandwidth is realized by timeslicing the data bus into multiple virtual channels according to a priority multiplexing scheme. Types of priority schemes that may be utilized include preemptive priority, simple priority, fixed priorities, and dynamically allocated, or stochastic, priorities. In one embodiment, a preemptive priority scheme is used which interrupts data transfer in a lower priority channel when a xe2x80x9cchannel readyxe2x80x9d signal is asserted by a party of a virtual channel having a higher priority. The current transfer is preempted and data transfer between the higher priority master/slave pair commences. The preempted master/slave pair continues to assert a channel ready signal indicating it is ready and able to access the data bus. If during the data transfer a latency occurs by which the higher priority master/salve pair de-asserts its channel ready signal, the arbiter selects the next highest priority virtual channel awaiting processing, and begins data transfer in that channel. However, under a preemptive priority scheme, as soon as the higher priority channel asserts a ready signal, the lower priority channel is again preempted and forced to wait until the higher priority data transfer is completed.
Preemptive priority, discussed above, results in the arbiter immediately granting a virtual channel immediate access to the data bus, when the virtual channel has the highest priority amongst the remaining virtual channels. In a simple priority scheme, a virtual channel having a higher priority than another virtual channel currently being processed will wait until the currently processed virtual channel deasserts its ready signal, whereupon the higher priority channel is immediately serviced.
In a fixed priority scheme which may be used in combination with either the preemptive or conditional priority types, each I/O module is assigned a fixed priority. By comparison, a dynamic allocation priority scheme shifts priorities among the various I/O modules according to a predetermined or pre-defined protocol. For example, a statistical weighting priority scheme may assign a higher priority to those I/O modules consistently requesting the largest amount of data or requesting data more frequently. Priority is dynamically allocated to those modules as these weightings change over time.
In another embodiment, a particular I/O module may be given high priority for a predetermined or pre-defined quantity (a xe2x80x9ccapxe2x80x9d) of data transferred, thereafter having a lower priority assigned to it after the module has exhausted its cap. This scheme allows precise and flexible allocation of the available bandwidth among various bus masters. For example, the graphics processor may be assigned a high priority virtual channel, but be limited to only an 8 megabyte data transfer per access in order to update and refresh the graphics display. After the 8 megabyte data transfer, however, the priority assigned-to the graphics output module may thereafter be reduced.
The concurrent data transfer capability of the architecture of this invention is enabled by essentially eliminating re-arbitration delays between multiple virtual channels. The protocol for acquiring a virtual channel is typically a one-time overhead. Address counters in the logic interface of each module keep track of the data transferred in the virtual channel and eliminate re-access handshaking in the event a data transfer is interrupted.
Moreover, a preemptive priority multiplexing scheme of the architecture of this invention minimizes latency to critical agents such as the CPU, by providing it top priority, or by increasing the likelihood that it will find the data bus immediately available since there are multiple channels available. In one embodiment, availability of the data bus to the CPU may be ensured by dedicating one virtual channel exclusively to the CPU.
In accordance with another aspect of the virtual channel architecture of this invention, multiple read and write buffers are provided at the memory controller interface to the split-transaction bus. The read and write buffers are positioned between the data and address buses and the memory controller. A dedicated read buffer for each virtual channel in the main memory interface permits prefetching data from main memory for a current request during the latency, as the corresponding master awaits access to the data bus, and permits storing the prefetched data in the buffer assigned to the virtual channel. Once the arbiter asserts as active the virtual channel assigned to that master, at least a portion of the requested data by that master is made immediately available for transfer via the read buffer. This effectively eliminates the inefficiencies normally associated with retrieving data directly from main memory.
By providing a set of read buffers pre-assigned to data bus virtual channels, performance (i.e., utilization of available bandwidth) of the virtual channel architecture of this invention is further enhanced. The number of read/write channels and the depth of each channel depends upon the characteristics of the agents in the system and the system applications. Additionally, by providing buffers in the memory controller interface, parallel transfers between two non-memory agents are allowed contemporaneous with the memory itself being fully utilized.
Once data transfer in a virtual channel is complete, the data source acknowledges completion of data transfer and relinquishes the virtual channel for use by other subsystems.