The invention relates generally to vertex information processing in video graphics systems. More particularly, the present invention relates to a method and apparatus for efficiently processing vertex information in a video graphics system, especially when such vertex information is stored by an application in a memory location that is inaccessible by the system""s graphics processing engine.
Video graphics systems are commonly used to display two-dimensional (2D) and three-dimensional (3D) objects on display devices, such as computer monitors and television screens. Such systems receive drawing commands and object configuration information from software applications, such as video games or Internet browser applications, process the commands based on the object configuration information, and provide appropriate signals to the display devices to illuminate pixels on the device screens, thereby displaying the objects. A block diagram for a typical video graphics system 100 is depicted in FIG. 1. The video graphics system 100 includes, inter alia, a host processing unit 101, a peripheral component interconnect (PCI) bus 103, a graphics processor 105, memory 107, 109 and a display device 111. The graphics processor 105 is typically located on a video card 113 together with local memory 109 that is accessed and used regularly by the graphics processor 105.
The PCI bus 103 typically includes appropriate hardware to couple the host processing unit 101 to the system memory 107 and the graphics processor 105, and to couple the graphics processor 105 to the system memory 107. For example, depending on the system configuration, the PCI bus 103 may include a memory and bus controller integrated circuit (IC) and an accelerated graphics port (AGP) bus to facilitate direct memory access (DMA) transfers of data stored in a graphics processor-accessible component 123 of the system memory 107 to the graphics processor 105. The display device 111 is typically a conventional cathode ray tube (CRT) display, liquid crystal display (LCD), or other display. Although not shown for purposes of clarity, other components, such as a video frame buffer, a video signal generator, and other known 3D pipeline components, are commonly incorporated between the graphics processor 105 and the display device 111 to properly display objects rendered by the graphics processor 105.
The host processing unit 101 is typically a central processing unit (CPU) or an equivalent microprocessor-based computer. The host processing unit 101 generally executes several software applications with respect to video graphics processing, including a host application 115, a runtime layer 117, and a graphics driver application 119. These applications 115-119 are typically stored on the hard disk component of the system memory 107, a memory card, a floppy disk, a CD-ROM, or some other computer-readable storage medium. The host application 115 is the application that initiates all drawing commands and provides all information necessary for the other graphics applications and processing components to display objects on the display device 111. For example, the host application 115 might be a word processing application, a video game, a computer game, a spreadsheet application, or any other application that requires two-dimensional or three-dimensional objects to be displayed on a display device 111.
In graphics systems, each object to be displayed is typically divided into one or more graphics primitives. Common primitives include a point, a line, and a triangle. Each primitive includes a respective number of vertices. For example, a point primitive has one vertex, a line primitive has two vertices, and a triangle primitive has three vertices. Each vertex has information associated with it to indicate, inter alia, its position in a reference coordinate system and its color. In most applications, such vertex information consists of a vector of multiple parameters to indicate the vertex""s position and other optional properties. For example, the vector may include parameters relating to the vertex""s normal, diffuse color, specular color, other color data, texture coordinates, and fog data. Consequently, the host application 115 not only issues drawing commands, but also provides the vertex information for each vertex of each primitive to be drawn to display each object of a graphics scene.
The runtime layer 117 provides a well-defined application programming interface (API) to the host application 115 and a well-defined device driver interface (DDI) to the graphics driver application 119. That is, the runtime layer 117 is a software layer that enables various host applications 115 to interface smoothly with various graphics driver applications 119. One example of a runtime layer application 117 is the xe2x80x9cDIRECTX7xe2x80x9d application that is commercially available from Microsoft Corporation of Redmond, Wash.
The graphics driver application 119 is the application that provides drawing commands to the graphics processor 105 in a manner understandable by the graphics processor 105. In most circumstances, the graphics driver application 105 and the video card 113 containing the graphics processor 105 are sold as a set to insure proper operation of the graphics rendering portion of the system (i.e., the portion of the graphics system 100 that receives vertex information from the host application 115, processes the vertex information, and generates the appropriate analog signals to illuminate the pixels of the display device 111 as indicated in the vertex information).
During its execution, the host application 115 stores vertex information in either the system memory 107 or the local memory 109 on the video card 113. To store the vertex information, the host application 115 first requests allocation of portions of the respective memory 107, 109 and then stores the vertex information in the allocated portions. The allocated portions of memory 107, 109 are typically referred to as vertex buffers (VBs) 125. The system memory 107 is generally divided into several components 121, 123, some of which are accessible by the graphics processor 105 and others of which are inaccessible by the graphics processor 105. The inaccessible components 121 of system memory 107 typically include all cacheable and swappable components of system memory 107. The host application 115 selects where to allocate the vertex buffers 125 and store the vertex information. As described in more detail below with respect to FIG. 2, the host application""s selection of where to store the vertex information can significantly impact the speed and efficiency of graphics processing.
After the host application 115 stores the vertex information in one or more vertex buffers 125, the host application 115 issues drawing commands to the graphics driver 119 via the runtime layer 117. Each drawing command typically includes an instruction (e.g., xe2x80x9cdrawxe2x80x9d), a memory identification (system memory 107 or video card local memory 109), an address in the identified memory 107, 109 of a vertex buffer 125, and a quantity of vertices in the vertex buffer 125. Upon receiving the commands, the graphics driver 119 processes and reformats the commands into a form executable by the graphics processor 105, and stores the processed/reformatted commands in allocated areas of system memory 107 or video card local memory 109 that are accessible by the graphics processor 105. Such areas of memory 107, 109 are typically referred to as command buffers (CBs) 127. After filling a particular command buffer 127 with a group of drawing commands, the graphics driver 119 dispatches the command buffer 127 by sending a signal to the graphics processor 105 instructing the processor 105 to fetch and process the commands in the command buffer 127. Typically, the graphics driver 119 is filling command buffers 127 faster than the graphics processor 105 can process the commands. Consequently, queuing algorithms are typically employed between the graphics driver 119 and the graphics processor 105 to allow the graphics processor 105 to quickly begin processing a new command buffer 127 upon completion of processing a prior buffer 127. The graphics processor 105 typically processes the command buffers 127 in the order in which they are dispatched by the graphics driver 119.
The types of commands issued by the host application 115 and the locations of the vertex buffers 125 for the commands substantially impact the speed at which the commands can be processed by the graphics rendering portion of the system 100. The commands promulgated by the host application 115 may be in various forms depending on the individual selection of the host application developer. Common types of commands include primitive lists, primitive strips, indexed primitive lists, and indexed primitive strips. The primitive list and primitive strip commands are less processing efficient, but may be used in virtually any video graphics system; whereas, the indexed primitive list and indexed primitive strip commands are more processing efficient, provided that the graphics rendering portion of the system 100 has DMA transfer capability. These commands and the processing speed effects of these commands with respect to vertex buffer location can be more readily understood with reference to FIG. 2.
FIG. 2 illustrates an exemplary two-dimensional object 200 to be rendered for display by the video graphics system 100. As shown, the exemplary object (e.g., rectangle 200) is divided into multiple graphics primitives (e.g., triangle primitives 201-216) and each primitive 201-216 includes multiple vertices 218-232. As described above, each vertex 218-232 has respective vertex information (e.g., position and color information) associated with it. The vertex information for each vertex 218-232 can range from eight bytes to eighty bytes or more in length depending on which vertex properties are specified for the vertices by the host application 115.
A primitive list command contains a list of vertices for each primitive 201-206 to be rendered. Receipt of a primitive list command from the host application 115 requires the graphics driver 119 to create and store a command in the command buffer 127 that includes the vertex information for each vertex 218-232 of each primitive 201-216 with no vertex information reuse. Thus, the primitive list command requires the graphics driver 119 to copy the vertex information for each vertex 218-232 in the list from the vertex buffer 125 into the command buffer 127. For the object 200 depicted in FIG. 2, a primitive list command would include forty-eight (48) vertices, three for each triangle primitive 201-216. Accordingly, the graphics driver 119 must copy the vertex information for all forty-eight vertices 218-232 into the command buffer 127 and the graphics processor 105 must then read the vertex information for all forty-eight vertices 218-232 from the command buffer 127. If the vertex information for each vertex 218-232 is twenty bytes long, the primitive list command requires transmission of at least 960 bytes of information to the graphics processor 105 in order for the graphics processor 105 to render the object 200. The primitive list command is the least processing efficient command.
The primitive strip command is more processing efficient because it incorporates some vertex information reuse. With respect to the exemplary object 200 of FIG. 2, each primitive strip command received from the host application 115 would include only ten vertices for its respective strip (e.g., ten vertices 218-227 for strip A and ten vertices 223-232 for strip B), where each strip contains the triangle primitives required to render one-half of the rectangular object 200. The primitive strip command is organized such that, when using triangle primitives, any three sequential vertices constitute a triangle primitive. Therefore, by using two primitive strip commands to instruct the graphics driver 119 to render the object 200, the graphics driver 119 need only copy the vertex information for twenty vertices from the vertex buffer 125 into the command buffer 127 in order to instruct the graphics processor 119 to render the object 200. The graphics processor 105 Would then read the vertex information for the twenty vertices from the command buffer 127 in order to process the commands. If, as discussed above, the vertex information for each vertex 218-232 is twenty bytes long, each primitive strip command requires transmission of 200 bytes of information to the graphics processor 105. Therefore, although the use of two primitive strip commands is more efficient than using a single primitive list command (400 bytes of information versus 960 bytes of information), both primitive list and primitive strip commands are inefficient because they require redundant transmission of at least some vertex information.
The indexed primitive list command is more processing efficient than the primitive list and primitive strip commands because it does not require redundant transmission of vertex information to the graphics processor 105. In this command, the host application 115 provides a list of indices (IN) corresponding to the vertices in a vertex buffer 125, an address of the vertex buffer 125 in a particular memory 107, 109, and the number of vertices for which vertex information is stored in the vertex buffer 125. The graphics driver 119 passes these indices, the quantity of vertices, and the vertex buffer address along to the graphics processor 105. The graphics processor 105 then reads the vertex information for each indexed vertex directly from the vertex buffer 125 in order to process the command and render the object 200. If each index is two bytes long, the command generated by the graphics driver 119 to instruct tile graphics processor 105 to render the object 200 depicted in FIG. 2 includes 96 bytes of index information (two bytes for each of forty-eight indices corresponding to the forty-eight vertices of the sixteen primitives 201-216 of the object 200). Accordingly, the graphics processor 105 must retrieve and process approximately 396 bytes of information (96 bytes from the command buffer 127 and 300 bytes from the vertex buffer) to render the object 200 when an indexed primitive list command is used, in contrast to 960 bytes or 400 bytes of information when a primitive list command or a primitive strip command, respectively, is used. Therefore, the amount of time required for the graphics processor 105 to acquire and process an indexed primitive list command is generally less, and in some instances substantially less, than the amount of time required to acquire and process primitive list or primitive strip commands, thereby improving overall graphics processing speed and efficiency. However, since the indexed primitive list command requires the graphics processor 105 to be able to read the vertex information from the vertex buffer 125, indexed primitive list commands may only be used in graphics systems with DMA capability. If an indexed primitive list command is received by a graphics driver 119 in a video graphics system that does not have vertex DMA capability, the graphics driver 119 must convert the indexed primitive list command into a regular primitive list command before storing the command in a command buffer 127. Converting the indexed primitive list command into a regular primitive list command is considerably slower than processing the indexed primitive list command because the graphics driver 119 must de-reference all the indices in the indexed primitive list command and copy all the vertex information associated with the indexed vertices into the command buffer 127.
In an indexed primitive strip command, similar to the indexed primitive list command, the host application 115 provides the graphics driver 119 a list of indices (IN) corresponding to the vertices in a vertex buffer 125, a quantity of vertices in the vertex buffer 125, and an address of the vertex buffer 125 in a particular memory 107, 109. However, in contrast to the indexed primitive list command, the host application 115 takes advantage of index reuse to reduce the number of indices that must be provided to render any particular object 200. Thus, to request display of the object 200 of FIG. 2, two indexed primitive strip commands would be used, each command including ten indices. If, as discussed above, each index is two bytes long, the commands generated by the graphics driver 119 to instruct the graphics processor 105 to render the object 200 depicted in FIG. 2 would include 40 bytes of index information (two bytes for each of the twenty indices corresponding to the twenty vertices of the sixteen primitives 201-216 of the object 200). Accordingly, the graphics processor 105 must retrieve and process approximately 340 bytes of information (40 bytes from the command buffer 127 and 300 bytes from the vertex buffer 125) to render the object 200 when indexed primitive strip commands are used, in contrast to 396 bytes, 400 bytes, or 960 bytes of information when an indexed primitive list command, primitive strip commands, or a primitive list command, respectively, are used. Therefore, for the rectangular object 200 of FIG. 2, two indexed primitive strip commands would be most processing efficient for a DMA-capable graphics system. However, it should be noted that the indexed primitive list command might be most processing efficient in certain circumstances when the object 200 to be rendered is not rectangular in shape.
Although the indexed primitive list and indexed primitive strip commands are preferred in DMA-capable graphics systems, locations of the vertex buffers 125 can detrimentally impact the benefits of using those commands. As noted above, the host application 115 selects the memory location for the vertex buffer 125. As also noted above, some components 121 of system memory 107 (e.g., cacheable and swappable components) are not accessible by the graphics processor 105. When the selected vertex buffer memory location is an area or component 121 of system memory 107 that is inaccessible by the graphics processor 105, the graphics driver 119 cannot simply, or with minimal processing, pass along the received indices and vertex buffer address. In such instances, prior art systems require the graphics driver 119 to convert the indexed command into a non-indexed primitive list command, thereby eliminating all the processing efficiency of using an indexed command in the first place. For example, in prior art systems, when the graphics driver 119 receives either an indexed primitive list command or an indexed primitive strip command from the host application 115 referencing a vertex buffer 125 located in a memory component 121 that is inaccessible by the graphics processor 105, the graphics driver 119 copies the vertex information for all the vertices from the vertex buffer 125 into the command buffer 127, thereby converting the original indexed primitive list or primitive strip command into a primitive list command and eliminating all the processing efficiency of using the indexed command.
One approach to resolving the above vertex buffer location problem is to require the host application 115 to store vertex information only in memory components 109, 123 that are accessible by the graphics processor 105. However, there are many host application developers and issuing such an edict may not be well received or followed by all developers. Moreover, even if all new host applications 115 did store their vertex information in graphics processor-accessible memory components 109, 123, many existing applications 115 do not do so, but still issue indexed commands. Thus, such a requirement would not improve processing performance of existing graphics systems.
Therefore, a need exists for a method and apparatus for efficiently processing vertex information in a video graphics system that facilitate use of indexed commands without loss of efficiency in the event that vertex information is stored by a host application in a memory location that is inaccessible by the system""s graphics processor.