The invention relates generally to vertex information processing in video graphics systems. More particularly, the present invention relates to a method and apparatus for improving processing throughput in a video graphics system, especially when the vertex information processing load of the system""s graphics processing engine is substantial.
Video graphics systems are commonly used to display two-dimensional (2D) and three-dimensional (3D) objects on display devices, such as computer monitors and television screens. Such systems receive drawing commands and object configuration information from software applications, such as video games or Internet browser applications, process the commands based on the object configuration information, and provide appropriate signals to the display devices to illuminate pixels on the device screens, thereby displaying the objects. A block diagram for a typical video graphics system 100 is depicted in FIG. 1. The video graphics system 100 includes, inter alia, a host processing unit 101, a peripheral component interconnect (PCI) bus 103, a graphics processor 105, memory 107, 109 and a display device 111. The graphics processor 105 is typically located on a video card 113 together with local memory 109 that is accessed and used regularly by the graphics processor 105.
The PCI bus 103 typically includes appropriate hardware to couple the host processing unit 101 to the system memory 107 and the graphics processor 105, and to couple the graphics processor 105 to the system memory 107. For example, depending on the system configuration, the PCI bus 103 may include a memory and bus controller integrated circuit (IC) and an accelerated graphics port (AGP) bus to facilitate direct memory access (DMA) transfers of data stored in the system memory 107 to the graphics processor 105. The display device 111 is typically a conventional cathode ray tube (CRT) display, liquid crystal display (LCD), or other display. Although not shown for purposes of clarity, other components, such as a video frame buffer, a video signal generator, and other known 3D pipeline components, are commonly incorporated between the graphics processor 105 and the display device 111 to properly display objects rendered by the graphics processor 105.
The host processing unit 101 is typically a central processing unit (CPU) or an equivalent microprocessor-based computer. The host processing unit 101 generally executes several software applications with respect to video graphics processing, including a host application 115, an operating system runtime layer 117, and a graphics driver application 119. These applications 115-119 are typically stored on the hard disk component of the system memory 107, a memory card, a floppy disk, a CD-ROM, or some other computer-readable storage medium. The host application 115 is the application that initiates all drawing commands and provides all information necessary for the other graphics applications and processing components to display objects on the display device 111. For example, the host application 115 might be a word processing application, a video game, a computer game, a spreadsheet application, or any other application that requires two-dimensional or three-dimensional objects to be displayed on a display device 111.
In graphics systems, each object to be displayed is typically divided into one or more graphics primitive groups. Common primitive groups include a point list, a line list, and a triangle list. Each primitive group includes a respective number of vertices. For example, a point list primitive group has one or more vertices making up one or more points, a line primitive group has two or more vertices making up one or more lines, and a triangle primitive has three or more vertices making up one or more triangles. Each vertex has information associated with it to indicate, inter alia, its position in a reference coordinate system and its color. In most applications, such vertex information consists of a vector of multiple parameters to indicate the vertex""s position and other optional properties. For example, the vector may include parameters relating to the vertex""s normal vector, diffuse color, specular color, other color data, texture coordinates, and fog data. Consequently, the host application 115 not only issues drawing commands, but also provides the vertex information for each vertex of each primitive to be drawn to display each object of a graphics scene.
The operating system runtime layer 117 provides a well-defined application programming interface (API) to the host application 115 and a well-defined device driver interface (DDI) to the graphics driver application 119. That is, the operating system runtime layer 117 is a software layer that enables various host applications 115 to interface smoothly with various graphics driver applications 119. One example of an operating system runtime layer application 117 is the xe2x80x9cDIRECTX7xe2x80x9d component application of the xe2x80x9cWINDOWSxe2x80x9d family of operating systems that is commercially available from Microsoft Corporation of Redmond, Wash.
The graphics driver application 119 is the application that provides drawing commands to the graphics processor 105 in a manner understandable by the graphics processor 105. In most circumstances, the graphics driver application 105 and the video card 113 containing the graphics processor 105 are sold as a set to insure proper operation of the graphics rendering portion of the system (i.e., the portion of the graphics system 100 that receives vertex information from the host application 115, processes the vertex information, and generates the appropriate analog signals to illuminate the pixels of the display device 111 as indicated in the vertex information).
During its execution, the host application 115 stores vertex information in either the system memory 107 or the local memory 109 on the video card 113. To store the vertex information, the host application 115 first requests allocation of portions of the respective memory 107, 109 and then stores the vertex information in the allocated portions. The allocated portions of memory 107, 109 are typically referred to as vertex buffers (VBs) 125. In addition, the host application 115 stores transformation matrices in either the system memory 107 or the local memory 109 on the video card 113. The graphics driver application 119 supplies the transformation matrices to the graphics processor 105. The transformation matrices are used by the graphics processor 105 to transform the position vector of each vertex from the reference coordinate system used by the application 115 to construct the primitives of the object to the coordinate system used to construct objects in a viewing frustum of the display device 111.
After the host application 115 stores the vertex information in one or more vertex buffers 125, the host application 115 issues drawing commands to the graphics driver 119 via the runtime layer 117. Each drawing command typically includes an instruction (e.g., xe2x80x9cdrawxe2x80x9d), a memory identification (system memory 107 or video card local memory 109), an address in the identified memory 107, 109 of a vertex buffer 125, and a quantity of vertices in the vertex buffer 125. Upon receiving the commands, the graphics driver 119 processes and reformats the commands into a form executable by the graphics processor 105, and stores the processed/reformatted commands in groups in allocated areas of system memory 107 or video card local memory 109 that are accessible by the graphics processor 105. Such areas of memory 107, 109 are typically referred to as command buffers (CBs) 127. An exemplary command buffer 127 is illustrated in FIG. 2.
The exemplary command buffer 127 includes five drawing commands 201-205, although actual command buffers 127 may include many more commands 201-205. As shown in FIG. 2, each command 201-205 in the buffer 127 preferably includes a draw instruction 207, a memory identifier 209 (system memory 107 or local video card memory 109), a vertex buffer address 211 within the identified memory and a quantity of vertices 213 in the vertex buffer 125. Execution of one or more drawing commands 201-205 is typically required to render a frame of video for display on the display device 111. For example, as illustrated in FIG. 2, execution of drawing commands 201-203 is required to render video frame 1; whereas, execution of drawing commands 204-205 is required to render video frame 2.
After filling a particular command buffer 127 with a group of drawing commands 201-205, the graphics driver 119 dispatches the command buffer 127 by sending a signal to the graphics processor 105 instructing the processor 105 to fetch and process the commands 201-205 in the command buffer 127. Typically, the graphics driver 119 is filling command buffers 127 faster than the graphics processor 105 can process the drawing commands 201-205 in the buffers 127. Consequently, queuing algorithms are typically employed between the graphics driver 119 and the graphics processor 105 to allow the graphics processor 105 to quickly begin processing a new command buffer 127 upon completion of processing a prior buffer 127. After the graphics processor 105 has completed processing a command buffer 127, the graphics processor 105 notifies the graphics driver 119 and the host application 115 by writing a command buffer status indication to a completed command buffer register in a graphics processor-accessible memory component of system memory 107. The notification may be a single bit (e.g., one for processed and zero for pending) or may be multiple bits (e.g., if additional status information is desired). Alternatively, the graphics driver 119 may receive the notification directly from the graphics processor 105 via the PCI bus 103. The graphics processor 105 typically processes the command buffers 127 in the order in which they are dispatched by the graphics driver 119.
In certain circumstances, such as when the vertex information of one or more drawing commands 201-205 in one or more command buffers 127 requires complex lighting processing, the graphics processor""s performance slows to the point where the application 115 and/or the graphics driver 119 must stop providing drawing commands until the graphics processor 105 catches up. A typical gauge for determining the speed at which the graphics processor 105 is operating relative to the host processing unit 101 is the number of video frames queued for processing in one or more command buffers 127. A video frame s the displayed frame resulting from the complete processing of one or more drawing commands, which may be contained in one or more command buffers. Once the host processing unit 101 is a threshold number (e.g., two or three) of frames ahead of the graphics processor 105, the host processing unit 101 will stop issuing new drawing commands related to new video frames until the graphics processor 105 catches up (i.e., until the number of queued video frames is below the threshold). For example, the graphics processor 105 may be displaying a first frame (e.g., frame A) and processing the next frame (e.g., frame B). If the video frame threshold is three, the host processing unit 101 can issue drawing commands for the next three video frames (e.g., frames C-E). If the graphics processor 105 is slowed for some reason (e.g., due to complex lighting calculations) and is not finished processing frame B by the time the host processing unit 101 is finished issuing drawing commands for frame E, the host processing unit 101 must wait for the graphics processor 105 to finish processing frame B before it can begin issuing drawing commands for frame F (i.e., the frame after frame E). Such waiting is inefficient and reduces system throughput.
Therefore, a need exists for a method and apparatus for improving processing throughput in video graphics system, wherein the method and apparatus substantially reduce the idle time of the host application and graphics driver particularly during periods of peak processing by the graphics processor.