A push buffer is a mechanism for one processor to send data and commands to another processor. For example, in a typical personal computer system with a central processing unit (CPU) and a graphics processing unit (GPU), the CPU writes commands and data into the push buffer and the GPU processes the data and commands in the same order in which they were written. Typically, a dedicated hardware FIFO memory between the CPU and GPU is not appropriate due to the large amount of memory needed for the push buffer. The data in the push buffer typically resides in either the computer's main RAM memory or in the graphics memory attached to the GPU.
Prior art FIG. 1 illustrates a prior art push buffer 100, in accordance with the prior art. As shown, the push buffer 100 resides in memory 101 and includes a plurality of portions 102 (e.g. one or more entries, etc.). While not shown, each entry 102 includes a data section and a header section, and may be written by any desired mechanism (e.g. a driver, a first processor, etc.). In the context of the present description, the push buffer 100 may refer to any buffer with a plurality of portions that are capable of being read and executed by another processor (not shown). The portions 102 in the push buffer 100 can be compressed by sharing header sections such that fewer header sections are required.
In use, the prior art push buffer 100 is capable of being written to include various commands. For example, as shown in FIG. 1, a call command 104 may be written such that, when an associated portion 102 of the push buffer 100 is read, the reading processor jumps anywhere (e.g. ahead, etc.) so as to skip a plurality of the portions 102 of the push buffer 100 (note skipped portions 106), and begin reading at the jump destination 107. Still yet, a return command 108 may be reached after reading the desired portion 102, so that operation may continue after the call command 104.
As a performance optimization, the processor prefetches large pieces of the push buffer 100. Unfortunately, the prefetched portions 102 are thrown away after the call command 104, causing wasted memory bandwidth and, therefore, reduced performance. Further, when the processor starts to read from the jump destination 107, undesired “bubbles” in processing occur due to the time it takes to get the portions 102 from the memory 101, thus creating latency issues.
Still yet, the prior art push buffer 100 may be used when conducting index buffer rendering, utilizing an index buffer 110. Typically, during index buffer rendering, the push buffer 100 may be loaded with data of the index buffer 110, namely, the index buffer location in memory, index bit widths, parameters, and a number of associated indices. Thus, the processor may read the data of the index buffer 110 in the push buffer 100.
Thus, in use, the aforementioned data of the index buffer 110 may be used to read indices from memory. With such indices, particular locations may be read within a set of vertex buffers in the push buffer 100. Such vertex buffers may contain various vertex attributes (e.g. position, color, surface normal, texture coordinate, etc.). To this end, the vertex attributes may be sent to a geometry unit for further processing, and then passed further down an associated pipeline, etc.
Unfortunately, an inefficient process is thus provided due to the requirement of a first read operation from the push buffer 100 in order to obtain the information (e.g. index buffer parameters, etc.) necessary to identify the indices which are, in turn, used to perform another read operation for the desired vertex attributes. Such a “two-level” method introduces further undesired latencies.
Alternatively, the indices in the index buffer 110 can be copied into the push buffer 100, but is an undesirable overhead that consumes memory bandwidth and processing time. As another alternative, a push buffer call and return, or two jumps, can be used to make the index buffer 110 behave as if it were copied into the push buffer 100, but this introduces the undesired “bubbles” described above. This further results in indirection and associated latency.
There is thus a need for overcoming these and/or other problems associated with the prior art.