The present invention relates to computer graphics, particularly to memory read and write commands between primitives.
Background: Computer Graphics and Rendering
Modern computer systems normally manipulate graphical objects as high-level entities. For example, a solid body may be described as a collection of triangles with specified vertices, or a straight line segment may be described by listing its two endpoints with three-dimensional or two-dimensional coordinates. Such high-level descriptions are a necessary basis for high-level geometric manipulations. These descriptions also have the advantage of providing a compact format which does not consume memory space unnecessarily.
Such higher-level representations are very convenient for performing the many required computations. For example, ray-tracing or other lighting calculations may be performed and a projective transformation can be used to reduce a three-dimensional scene to its two-dimensional appearance from a given viewpoint. However, when an image containing graphical objects is to be displayed, a very low-level description is needed. For example, in a conventional CRT display, a “flying spot” is moved across the screen (one line at a time), and the beam from each of three electron guns is switched to a desired level of intensity as the flying spot passes each pixel location. Thus, at some point the image model must be translated into a data set which can be used by a conventional display. This operation is known as “rendering.”
A graphics-processing system typically interfaces to the display controller through a “frame store” or “frame buffer”. The frame store can be written to randomly by the graphics processing system, and also provides the synchronous data output needed by the video output driver. (Digital-to-analog conversion is also provided after the frame buffer.) This interface relieves the graphics-processing system of most of the burden of synchronization for video output. Nevertheless, the amounts of data which must be moved around are very sizable and the computational and data-transfer burden of placing the correct data into the frame buffer can still be very large.
Even if the computational operations required are quite simple, they must be performed repeatedly on a large number of datapoints. If blending is desired, additional bits (e.g., another 8 bits per pixel) will be required to store an “alpha” (or “transparency value”) for each pixel. This calculation implies manipulation of more than 3 billion bits per second without allowing for any of the actual computations being performed. Thus, it may be seen that this environment has unique data manipulation requirements.
If the display is unchanging, no demand is placed on the rendering operations. However, some common operations (such as zooming or rotation) will require every object in the image space to be re-rendered. Slow rendering will make the rotation or zoom appear jerky. This effect is highly undesirable. Thus, efficient rendering is an essential step in translating an image representation into the correct pixel values. Need for efficient rendering is particularly acute in animation applications where newly rendered updates to a computer graphics display must be generated at regular intervals.
The rendering requirements of three-dimensional graphics are particularly heavy. One reason for such heavy requirements is that even after the three-dimensional model has been translated to a two-dimensional model some computational tasks may be bequeathed to the rendering process. (For example, color values will need to be interpolated across a triangle or other primitive.) These computational tasks tend to burden the rendering process. Another reason is that since three-dimensional graphics are much more lifelike, users are more likely to demand a fully rendered image. (By contrast, in the two-dimensional images created e.g., by a GUI or simple game, users will learn not to expect all areas of the scene to be active or filled with information.)
FIG. 2 is a very high-level view of other processes performed in a 3D graphics computer system. A three dimensional image which is defined in some fixed 3D coordinate system (a “world” coordinate system) is transformed into a viewing volume (determined by a view position and direction), and the parts of the image which fall outside the viewing volume are discarded. The visible portion of the image volume is then projected onto a viewing plane, in accordance with the familiar rules of perspective. This produces a two-dimensional image, which is now mapped into device coordinates. It is important to understand that all of these operations occur prior to the operations performed by the rendering subsystem of the present invention.
Background: Read-Modify-Write
In graphics systems, a read-modify-write operation is relied on when rendering primitives. Information regarding primitives is read from specific memory locations. If the information is modified, it must be written back in order for any new values to be used in later processing. In a heavily pipelined system, the individual functions of this read-modify-write operation can be widely separated in time. A situation can occur in which a second read on a memory location is needed while outstanding data from a first read of the same location has been modified but has not yet been written back. If this situation is not properly handled, the second read will return the same data as the first read. In graphics, this is guaranteed not to occur within a primitive as the rasterization rules forbid it, but it can happen between primitives.
Currently, the solution to this second read problem is to force all outstanding writes to complete before any reads for the new primitive begin. Implementation of this solution is generally through a message such as SuspendReads (or such as a PrepareToRender message as in earlier generation pipelined graphics processors). The Read unit (of a read/write processor pair for either the localbuffer or framebuffer) receives the SuspendsReads message and writes it to address FIFO (linking it to the Memory Controller). The message is also forwarded down the pipeline. The Read Unit continues to generate messages to the address FIFO. However, the Memory Controller will not issue reads on these addresses (the Memory Controller processes reads as high priority until it encounters a SuspendReads command). The Write Unit inserts write addresses and data into the Write FIFO (address and data). Once the Write unit receives the SuspendsReads message, it inserts this message into its queue (the message is renamed to ResumeReads for clarity). The message is then passed down the pipeline. Since the Memory Controller has suspended read processing it can process write requests. Write requests are processed until the Memory Controller reaches the ResumeReads message. Once the Memory Controller knows the last writes have completed (or are unconditionally committed), the ResumeReads message is acted upon and the read portion of the Memory Controller is released to allow further reads.
This message passing mechanism is very simple and robust. However, the time delay between the suspend and resume commands reaching the Memory Controller and being acted upon can take close to 45 cycles (or more). This is a big increase over previous chips and has arisen due to using significantly higher levels of pipelining in the core units of the graphics processor, an increase in the number of cycles of latency in the memories, largely because of their synchronous nature, and re-synchronizing between core and memory clock domains.
The desire to increase the small primitive rate by reducing the number of set up cycles has exposed the suspend/resume feedback path as a bottleneck which must be overcome to increase the small primitive rate.
A Read Monitor Unit
This application discloses an innovative system and method for increasing rendering efficiency in pipelined graphics systems. In the disclosed embodiments, reading of pixel information during the rendering of a primitive is suspend if the pixel information has been touched by a previous write. In some embodiments, reads of pixel information are also suspended periodically when a table tracking the information becomes full. In some embodiments a Read Monitor Unit controlled by the graphics system's Memory Controller is used to track pixels which have been affected by rendered primitives. In some embodiments, a history list is used to avoid suspension of reads for overlapping primitives. In a particular embodiment, the table used to track affected pixels is two-bits, the first bit tracking whether the pixel has been touched by a primitive since the last SuspendReads command was invoked and the second bit tracking whether the pixel has been touched by the current primitive. When a power on reset or a SuspendReads command occurs, both the first and second bits are reset. The second bit is also reset at the start of rendering for each primitive. In a separate embodiment, a unique number is assigned to each primitive to be rendered. The number is recorded for each active pixel touched by the primitive. If an earlier primitive has touched this pixel, suspension of reads can be invoked and the table reset (i.e., every entry is marked invalid).
The disclosed innovations, in various embodiments, provide one or more of at least the following advantages:                savings in processing time by preventing unnecessary suspension of reads.        increased throughput in primitive rendering.        cost savings due to less stringent processing requirements.        The presently preferred embodiment offers the advantage of avoiding an automatic SuspendReads and clearing of the table every 32 primitives.        