This invention generally relates to data communication for a processing unit in a computer, e.g., a microcomputer.
Processing of 3-dimensional graphics and video usually involves transmission and processing of a large amount of graphic data. Consumer multimedia applications such as educational software and computer games, for example, may require processing of a single 3-dimensional image in excess of 20 MB of data. Such data need be transmitted to a graphic controller having a graphic accelerator and a graphic memory from the processor, the system main memory (i.e., RAMs), or another device connected to a communication bus (such as a CD-ROM). Hence, 3D graphics and video demand a large bandwidth for data transmission and a large storage space in the system memory or graphic memory.
One standard communication bus for connecting input and output devices in personal computers is Intel""s peripheral component interconnect (xe2x80x9cPCIxe2x80x9d) bus. FIG. 1 shows that a PCI chipset 104 is implemented as a communication hub and control for the processor 101, the main memory 106, and the PCI bus 110. The graphic controller 120 is connected as a PCI device and transfers graphic data to a display. Other types of buses can also be connected to the PCI bus 110 through another control chipset. The current PCI bus, limited in bandwidth to 132 MB/s, is often inadequate to support many graphic applications. In addition, since the PCI bus 110 is shared by the graphic controller 120 and other PCI devices 130, the actual PCI bandwidth available for graphic data is further reduced. Therefore, the PCI bus 110 forms a bottleneck for many graphic applications.
Pre-fetching graphic data to the graphic memory can alleviate the bottleneck of the PCI bus, without increasing the graphic memory (usually at about 2-4 MB). But the performance of the graphic controller may still be limited due to the sharing of the PCI bus. Another approach increases the size of the graphic memory but may not be practical for the mass PC market.
In recognition of the above limitations, Intel developed an accelerated graphic port (xe2x80x9cAGPxe2x80x9d) designated to transmit graphic data to the graphic controller at a peak bandwidth higher than the maximum bandwidth of the current PCI bus, e.g., up to 1.066 GB/s as supported by the Fast Writes in the latest AGP specification 2.0. FIG. 2 schematically shows an AGP chipset 210 (e.g., Intel""s 440LX AGPset) replacing the PCI chipset 104 in FIG. 1. The graphic controller 120 is connected through the AGP 220 rather than the PCI bus 110. The AGP 220 allows the graphic controller 120 to execute data directly from the cache, the main memory 106, or other PCI devices 130 by reducing or eliminating caching from the graphic memory. Hence, the graphic memory can remain small to reduce cost. In addition, AGP 220 reduces the data load on the PCI bus 110 and frees up the PCI bus 110 for the processor to work with other PCI devices 130.
It is desirable to further improve the efficiency in transmission and processing of data in personal computers and other systems. In AGP-based computers, for example, transmission of graphic data may be specially designed to fully utilize the high bandwidth of the AGP port.
The present disclosure provides devices and associated methods for controlling data transfer from a storage device (e.g., a processor cache) to a receiving device (e.g., a graphic processor) in a predetermined ordering. Such predetermined ordering can be used to improve the efficiency of data transmission from the storage device to the receiving device.
One embodiment of the device includes a first circuit to receive data and associated address information from the storage device and a second circuit to reorder the data into ordered packets each in the predetermined ordering. The first circuit is configured to process the address information to determine a data ordering of the received data according to their addresses in the storage device. This data ordering is fed to the second circuit which accordingly performs the reordering operation.
The first and second circuits may be pipelined through a queue circuit to improve the efficiency of the reordering operation. The queue circuit may include a token queue and a data queue that respectively receive and store the tokens and the data from the first circuit.
One of applications of the disclosed devices and methods is to improve the data transfer from a processor to a graphic controller such as AGP-based personal computers.