In the followings, the non-unified memory architecture (Non-UMA) controller is defined as one including two separate DRAM controllers, for instance one core logic memory controller and one display memory controller, to interface the system controller and the graphical accelerator respectively. Unified Memory Architecture (UMA) is defined as one in which the graphical memory is physically allocated in the system memory. For example, an SDRAM is used as the system memory and, yet, the top memory of any row of the SDRAM can be allocated to serve as the graphical memory under Unified Memory Architecture. Furthermore, under the UMA, the graphical texture data can be dynamically allocated in the non-graphical memory area of the SDRAM. And, under the UMA, no other separate local graphical memory is physically allocated for the graphical accelerator.
FIG. 1 illustrates the traditional or a Non-UMA based personal computer system which typically separates the graphical memory 18 from the system memory 13. Each memory is interfaced through the memory controllers (not shown) respectively in the graphical accelerator 16 and the host bridge 12. The graphical accelerator 16 and the host bridge 12 may interface to each other via the Advanced Graphic Port (AGP) bus 140 or Peripheral Component Interconnect (PCI) bus or any other proprietary bus that could deliver high throughput in the application. The host bridge 12 situated among the host bus 110, AGP bus 140, the DRAM bus 120 and PCI bus 130 provides paths and buffering to efficiently perform data flow originated from one bus to another bus.
Possible data flow is enumerated in the following. CPU 10 accesses system memory 13 through the host bus 110, the host bridge 12, and the system memory bus 120. CPU 10 accesses the PCI devices 15 through the host bus 110, the host bridge 12, and the PCI bus 130. PCI bus master among the PCI devices 15 accesses the system memory 13 through the PCI bus 130, the host bridge 12, and the system memory bus 120. CPU 10 accesses the graphical memory 18 through the host bus 110, the host bridge 12, the AGP bus 140, the graphical accelerator 16, and the graphical memory bus 150. CPU 10 accesses the buffers (not shown) in the graphical accelerator 16 through the host bus 110, the host bridge 12, and the AGP bus 140. The graphical accelerator 16 accesses the graphical memory 18 through the graphical memory bus 150 to retrieve the frame buffer data and/or the local texture data for performing graphical operation in the graphical accelerator 16 or for conducting refresh operation of the screen data. As well known in the arts, texture data are data of a display which are associated with the three dimensional (3D) display effect. The graphical accelerator 16 may also access the system memory 13 through the AGP bus 140 to retrieve the texture data stored in the system memory 13. PCI bus master 15 may also write video data to the graphical accelerator 16 through the PCI bus 130, the host bridge 12, and the AGP bus 140.
Depending on how much functionality the graphical accelerator 16 can support, the possible data types flowing on the AGP bus 140 are classified in the followings.
CPU 10 reads or writes frame buffer data in the graphical memory 18 via the AGP bus 140. Typically, a read ahead cache and a post write buffer are provided in the graphical accelerator 16 to efficiently utilize the AGP bus bandwidth.
CPU 10 writes graphical commands, known as 2D or 3D graphic commands, into the graphical accelerator 16. Typically, a command queue (not shown) is allocated to store the commands such that the AGP bus 140 can be released as long as the command can be posted into the command queue. Besides, the command queue can be extended into the graphical memory 18. In this way, the depth of the command queue is virtually infinite with a result that command can always be posted into the command queue in zero-wait state. Each posting command will be retrieved to process when it turns out to be the next processing entry.
Processing commands typically involves retrieving the graphical data in the graphical memory 18 such that the graphical data can be computed or transformed by the graphical engine (not shown) in the graphical accelerator 16. The computed data are then written back to the graphical memory 18 for display later on.
As well known in the arts, 3D rendering operation involves the texture, the brightness, the depth, color, hue and tint of the surfaces of the objects in a graphics. Generally, 3D rendering has a voracious appetite for memory bandwidth and continues to put upward pressure on performance of the graphical memory 18. AGP bus 140 links the graphical accelerator 16 with the host bridge 12 such that some of the 3D rendering structures, i.e. the texture data, may be effectively shifted into the system memory 13, relieving the pressure of higher cost of the local graphical memory 18. Typically, the texture data are the first structure targeted for shifting to system memory 13. When implemented, reading of texture data from the system memory 13 is normally operated through the AGP bus 140.
The PCI master 15 writes the video data into the graphical memory 18. This path allows a PCI video capture subsystem to transfer data into the graphical memory 18 through a CPU write buffer (not shown) in the graphical accelerator 16.
Usually a CRT First-In-First-Out (FIFO) memory is provided in the graphical accelerator 16. The CRT FIFO would generate a request signal to periodically retrieve the display data from the display memory 18 and store the display data into the CRT FIFO such that screen data can be updated periodically to keep display continuous. To ensure the display motion is continuous, this request signal is normally treated as the second highest priority request in the graphical memory 18. The DRAM refresh request is normally generated every 15.6 .mu.s (microsecond) to enforce a refresh cycle required by the DRAM to maintain data integrity of the memory subsystem. DRAM refresh request is placed as the highest priority generally.
A typical system controller within the host bridge 12 is set forth in FIG. 2. A command queue 30 is employed to serve a highly pipelined host bus 110. The host bus 110 may be interfaced with any kind of processor or multi processors, like Pentium, Pentium .PI., Power 600, and so forth. The AGP pipe 32, the PCI pipe 34, the Write FIFO 36 or the Read FIFO 38 are employed to buffer the data in the originated bus before the data are consumed in the destination bus. As well known in the arts, within the AGP pipes 32, the AGP commands are executed in a pipelined manner. Also, within the PCI pipe 34, the PCI commands are executed in a pipelined manner.
The control circuitry (not shown) associated with the command queue 30 interprets and dispatches the command to the target destination. For instance, a memory write command is digested by posting the write data into the Write FIFO 36 while a memory read command gives rise to a read DRAM cycle. The read data are stored in the Read FIFO 38 before they are forwarded to the host bus 110. Any PCI master 15 attempts to access the AGP device 16 or the system memory 13 is translated into a host cycle, and then consumed by the command queue 30.
Statistically, today's high performance graphical engine demands a voracious appetite of the memory bandwidth. Furthermore, a high bandwidth memory subsystem is required for the UMA which needs to serve the access requests from the host processor (CPU), or the bus masters (PCI) in addition to serving the access requests from the graphical engine. It can be readily achieved by some of today's advanced memory technology. For instance, the memory bandwidth can reach 800MB by building up an 8-byte wide system memory bus running at 100Mz with today's mature PC100 SDRAM technology. However, even the system is equipped with a wide bandwidth memory system in UMA, an intelligent arbitration scheme is required to efficiently utilize the memory bandwidth such that the following goals summarized in priority order are fulfilled.
(1) To ensure data integrity in the memory subsystem.
(2) To keep the display quality continuous, and thus smooth.
(3) To meet shorter latency requirement of some bus masters, like USB compliant or IEEE 1394 compliant devices. The latency is the duration between the time at which a request is issued and the time of granting of memory bus to the request.
(4) To attain higher graphical performance.
(5) To minimize the impact to the host bus performance.