1. Field of the Invention
The present invention pertains to the field of computer graphics. More particularly, the present invention relates to an apparatus and method for increasing the bandwidth to a graphics subsystem.
2. Related Art
FIG. 1A illustrates a block diagram of an example computer system 100A in which the present invention can be used. Computer system 100A includes a graphics subsystem 102, a central processing unit (CPU) 104, a system memory 108 (also known as main memory) and a system bridge 106. A processor bus (also known as a host bus or a front bus) 110 couples CPU 104 and system bridge 106. A memory bus 112 couples system memory 108 and system bridge 106. Additionally, peripheral bus 114 couples various input and/or output (I/O) devices 120, also referred to as peripheral devices, to system bridge 106.
Examples of peripheral devices that are used to input commands and information include keyboards and cursor control devices (e.g., a mouse, trackball, joystick, etc.). Examples of peripheral devices that are used to output information include a printer and a display screen. Additional examples of peripheral devices include floppy disk drives, hard disk drives, optical disk drives, and the like. In addition to coupling other I/O devices 120 to system bridge 106, FIG. 1A also shows peripheral bus 114 coupling graphics subsystem 102 to system bridge 106.
In a simple form, each bus (i.e., system bus 110, memory bus 112, and peripheral bus 114) is essentially a collection of wires and connectors for transferring data between subsystems (i.e., CPU 104, system memory 108, I/O devices 120, and graphics subsystem 102) of computer system 100A. Alternatively, each bus may be specifically designed for the type of device its supporting. For example, peripheral bus 114 can be a Peripheral Component Interconnect (PCI) bus which is a self-configuring personal computer local bus designed by Intel Corporation, Santa Clara, California, specifically for peripheral devices. A PCI bus typically provides a bandwidth of 133 Mbytes/sec when running at 33 MHz. When peripheral bus 114 is a PCI bus, I/O devices 120 are sometimes referred to as PCI-based devices.
Graphics subsystem 102 (also known as a graphics controller, graphics accelerator, graphics chip, graphics hardware, graphics board, or graphics card) is the hardware of computer system 100A that is dedicated to enabling computer system 100A to display images, such as three dimensional (3D) objects on a display (not shown). Graphics subsystem 102 typically includes its own local processor for computing graphical transformations, and the like. Additionally, graphics subsystem 102 typically has its own local graphics memory, such as texture memory and a frame buffer, which are reserved for storing data associated with graphical representations. The local graphic memory can be conventional DRAM, or other special types of memory such as video RAM (VRAM), which enables both video circuitry and the local processor to simultaneously access the local graphics memory. Such dedicated local graphics memory is typically more expensive than system memory 108 and cannot be used by computer system 100A for other purposes (that is, non graphics purposes) when it is not being fully utilized by graphics subsystem 102.
As computer graphics become more realistic and complex, increased burdens are places on computer system 100A. For example, the generation of full motion animated 3D graphics requires the performance of continual intensive geometry calculations that define an object in 3D space. These geometry calculations can be performed by CPU 104, which is well-suited for performing these calculations because it can handle the floating point type operations that are often required. Alternatively, these calculations can by performed by the local processor of graphics subsystem 102.
Graphics subsystem 102 processes various types of graphics data. For example, graphics subsystem 102 processes texture data in order to create lifelike surfaces and shadows within a 3D image. Often one of the most critical aspect of 3D graphics is the processing of texture maps, the bitmaps which are used to represent in detail the surfaces of 3D objects. Texture map processing consists of fetching one, two, four, eight, or more texels (texture elements) from a bitmap, averaging them together based on some mathematical approximation of the location in the bitmap (or multiple bitmaps) needed on the final image, and then writing the resulting pixel to the local graphics memory of graphics subsystem 102. The texel coordinates are functions of the 3D viewpoint and the geometry of the object onto which the bitmap is being projected and the location of the bitmap on the object. Other types of graphics data that are processed by graphics subsystem 102 include geometry data, also referred to as polygon descriptions (e.g., triangles consisting of three vertices), normals, color indices, and the like.
Typically, graphics data, such as texture maps, are read from an I/O device 120 (such as a hard drive) and loaded into system memory 108. For example, texture map data travels through peripheral bus 114, system bridge 116, and memory bus 112 before being loaded into system memory 108. The texture map can then be read into CPU 104, from system memory 108, when it is to be used. CPU 104 performs any necessary transformation and then caches the results. The cached data is either written back to system memory 108 or sent (pushed) from CPU 104 to graphics subsystem 102. If the transformed textures are written back to system memory 108, graphics subsystem 102 can read (pull) the transformed textures from system memory 108. Upon receiving the transformed textures and/or any other types of graphics data, graphics subsystem 102 can immediately use them or write them in its local graphics memory.
Thus, for computer system 100A shown in FIG. 1A, graphics data that is destined for or generated by graphics subsystem 102 must always travel over peripheral bus 114. Therefore, the bandwidth of peripheral bus 114 (i.e., 133 MBytes/sec, if peripheral bus 114 is a typical current PCI bus) limits the rate at which texture maps, and other graphics data, can be transferred to and from graphics subsystem 102. Additionally, since graphics subsystem 102 shares peripheral bus 114 with several other I/O devices 120, congestion often occurs on peripheral bus 114. Accordingly, peripheral bus 114 is often a bottleneck in computer system 100A of FIG. 1A.
Solutions for overcoming the above deficiencies have been proposed. For example, FIG. 1B illustrates the use of a dedicated graphics bus 116 which avoids the problems associated with graphics subsystem 102 sharing bus resources with various other I/O devices 120. As shown in FIG. 1B, graphics bus 116 couples graphics subsystem 102 and system bridge 106. An example of such a dedicated graphics bus 116 is an Accelerated Graphic Port (AGP) compatible bus. AGP, which is an interface specification developed by Intel Corporation, Santa Clara, Calif., is based on PCI, but is designed especially for the throughput demands of 3D graphics. Rather than using the PCI bus for graphics data, AGP introduces a dedicated point-to-point channel so that graphics subsystem 102 can directly access system memory 108. An AGP channel is 32 bits wide and runs at 66 MHZ. This translates into a total bandwidth of 266 MBytes/sec as opposed to a current typical PCI bandwidth of 133 MBytes/sec. AGP also supports two optional faster modes with throughput of 533 MBytes/sec and 1.07 GBytes/sec. In the arrangement of FIG. 1B, if graphics bus 116 is an AGP bus, then system bridge 106 can be Intel""s 440BX chipset.
Another example of a bus 116 that can be used to transmit graphics data is a serial bus, such as a FIREWIRE (also know as IEEE 1394) compliant bus. FIREWIRE is a serial bus interface standard offering high-speed communications and isochronous real-time data services. More specifically, FIREWIRE, which is a trademark of Apple Computer, Inc., Cupertino, Calif., is a bus standard that supports data transfer rates of 100, 200, or 400 MBytes/sec. Other companies use other names, such as I-link and Lynx, to describe their IEEE 1394 compliant products.
System bridge 106 performs system interconnect functions. That is, one of the main purposes of system bridge 106 is to facilitate data transfers throughout computer systems 100A, 100B. For example, system bridge 106 enables CPU 104 to system memory 108 access to occur independently of CPU 104 to I/O device 120 (e.g., hard drive) access. Additionally, for example, system bridge 106 enables CPU 104 to read data from an I/O device 120 while simultaneously sending data to graphics subsystem 102. Thus, system bridge 106 can be a component dedicated to system interconnect functions, such as a crossbar switch. In one embodiment, system bridge 106 also controls access to system memory 108, and thus performs the functions of a memory controller. Alternatively, memory control functions can be performed by a separate subsystem, such as a dedicated memory controller (not shown), or can reside within system memory 108.
Typically, virtual to physical memory translation functions are performed by CPU 104. However, in the embodiment where system bridge 106 provides the functions of a memory controller, system bridge 106 can also support virtual memory and paging by translating virtual addresses into physical addresses. For example, system bridge 106 can include a page table which is indexed by a page number. Each page table entry (PTE) gives the physical page number corresponding to the virtual one. This is combined with a page offset to give the complete physical address. A PTE may also include information about whether the page has been written to, when it was last used, what kind of processes (user mode, supervisor mode) may read and write it, and whether it should be cached.
In one embodiment, system bridge 106 can assist in maintaining cache-coherency, which means that data in a cache is updated and moved appropriately as it is accessed by a subsystem (such as CPU 104) of the whole computer system 100A, 100B. For example, when an I/O device 120 writes data into system memory 108, that data is also stored in a cache. If a CPU 104 attempts to read from that same memory location, CPU 104 will actually be provided with a copy of the data stored in the cache. In addition to speeding access to the data, this scheme also serves to prevent multiple memory accesses for the same piece of data, which frees up memory bus 112 for other accesses. Further, system bridge 106 may support multiplexing of system memory 108. Of course, system bridge 106 does not need to support of all these features to work with the present invention.
Examples of components that can perform the interconnect functions of system bridge 106, without performing memory control functions, are the Crossbow ASIC which is part of the Crossbar System Interconnect designed by Silicon Graphics, Inc. (SGI), Mountain View, Calif., and the Ultra Port Architecture (UPA) interconnect designed by Sun Microsystems, Palo Alto, Calif. Examples of system bridges 106 that can perform both system interconnect and memory control functions are Intel""s 440 BX AGP chipset and the Cobalt graphics chipset designed by SGI. Each of these exemplary system bridges 106 support some features that are not supported by the other examples. An important feature of system bridge 106, with respect to the present invention, is that it interconnects the various subsystems/devices of computer systems 100A, 110B, and allows the various subsystems/devices to access one another. For example, in a preferred embodiment, system bridge 106 enables CPU 104 to access data from system memory 108 while an I/O device 120 (e.g., a hard drive) simultaneously sends data to system memory 108. Additionally, a system bridge 106, such as SGI""s Cobalt graphics chipset, may even perform some graphics operations that are typically performed by graphics subsystem 102 or CPU 104.
Graphics busses transport graphics data. Graphics data includes graphics commands that pertain to texture, geometry, normals, colors, and the like. A graphics command can be a graphics application program interface (API) command, or other read or write type commands. At a lower level, a graphics command can be any type of machine code command that graphics subsystem 102 understands (for example, a read or write command). An example 3D graphics language is OPENGL which was developed by SGI. Another example of an API designed for manipulating and displaying 3D objects is Direct3D, which was developed by Microsoft Corp., Redmond, Wash. Of course graphics data can also be more simple two dimensional (2D) graphics commands.
Regardless of whether graphics data (e.g., texture maps, transformed textures, geometry data, or graphics commands) is read from system memory 108 or transferred directly from CPU 104, the graphics data must travel over graphics bus 116. Accordingly, graphics bus 116 may be a bottleneck when large amounts of graphics data are being transferred to graphics subsystem 102. This is especially true when graphics data is being simultaneously transferred from both CPU 102 and system memory 108 to graphics subsystem 102. Accordingly, the bandwidth to graphics subsystem 102 must be increased in order to take advantage of increasingly powerful advanced graphics hardware.
The typical methods used for increasing bandwidth are to increase the width and/or the speed of a bus. With respect to increasing speed, there are electrical limits on the speed that a bus can handle. With respect to width, hardware complexity and cost typically increase as the width of a bus increases. For example, as the width of a bus increases the number of required physical connections between the bus and a subsystem also increases. Such an increase in physical connections may not be compatible with existing hardware (such as connectors) of subsystems (such as system bridge 106). One recent example of a graphics bus having an relatively high bandwidth is Intel""s AGP, which is discussed above. However, although an AGP bus provides a relatively high bandwidth, the AGP bus can still become a bottleneck where a graphics subsystem can handle more data than the AGP bus can deliver.
Accordingly, there is a need to increase the bandwidth to graphics subsystem 102. This will allow a greater throughput of graphics data. In addition, increases in the bandwidth to graphics subsystem 102 may enable existing system resources, such as system memory 108, to be utilized in preference to dedicated local memory within graphics subsystem 102. This is beneficial because system memory 108 is usually much less expensive than local graphics memory. Additionally, unlike local graphics memory, system memory 108 can be used by other subsystems of computer system 100A, 100B, for other purposes (that is, non graphics purposes), when it is not needed by graphics subsystem 102.
The present invention, which is directed toward an apparatus and method for increasing the bandwidth to a graphics subsystem, can be used in a computer system that includes a central processing system and a system memory, each of which is coupled to a system bridge. More specifically, the apparatus and method of the present invention can be used to increase the throughput of graphics commands that can be transferred between the system bridge and a graphics subsystem.
In one embodiment, the apparatus of the present invention includes a graphics bus scheduler, a plurality of busses, a graphics bus de-scheduler, and buffers. A first buffer temporarily stores multiple graphics commands which are transferred from the system bridge to the first buffer in a specific order. The graphics bus scheduler tags each of the multiple graphics commands with tags that indicate the specific order of the multiple graphics commands, assigns each of the multiple graphics commands to one of a plurality of busses, and transfers each of the multiple graphics commands from the first buffer to its assigned one of the plurality of busses.
In one example, the graphics bus scheduler assigns each of the multiple graphics commands to the plurality of busses according to a type of command (e.g., geometry, texture). In another example, the graphics bus scheduler assigns the multiple graphics commands to the plurality of busses in such a manner as to create a pipeline effect across the plurality of busses.
The plurality of busses transfer the multiple graphics commands between the graphics bus scheduler and a graphics bus de-scheduler. When the graphics bus de-scheduler accepts the multiple graphics commands transferred across the plurality of busses, the accepted multiple graphics commands do not necessarily have the specific order that the commands had within the first buffer. Accordingly, once it has accepted the multiple graphics commands, the graphics bus de-scheduler transfers the accepted multiple graphics commands into a second buffer according to the tags, such that the multiple graphics commands regain the specific (i.e., original) order within the second buffer. The reordered (also referred to as regrouped) multiple graphics commands are then be transferred from the second buffer to the graphics subsystem.
In an alternative non-regrouping embodiment, the apparatus of the present invention transfers graphics commands from the system bridge directly to specific functional components of the graphics subsystem. In this embodiment the graphics bus scheduler can assign each of the multiple graphics commands to one of the plurality of busses according to a type of command. Further, in this embodiment the multiple graphics commands are not tagged and are not regrouped. Accordingly, in this embodiment the graphics bus de-scheduler and buffers can be omitted.
By substantially increasing the bandwidth between system bridge 106 and graphics subsystem 102, the amount of local graphics memory (such as local texture memory 718 and/or frame buffer 714) of graphics subsystem 102 can be radically reduced when using the present invention.
Further features and advantages of the present invention, as well as the structure and operation of various embodiments of the present invention, are described in detail below with reference to the accompanying drawings.