This invention relates to computer graphics, and in particular to a digital differential analyzer (DDA) with parallel processing paths for rendering images.
Conventional computer systems manipulate graphical objects as high-level entities. Such graphical objects can be represented using a collection of graphical shapes such as lines, triangles, trapezoids, and other polygons. The shapes can be defined using a collection of end points having two-dimensional (2-D) or three-dimensional (3-D) coordinates. This high-level description simplifies the definition and storage of graphical objects.
To display the graphical objects, the high-level description is transformed into a low-level description suitable for display on, for example, a CRT. This transformation process is generally referred to as xe2x80x9crendering.xe2x80x9d The rendering process typically includes the decomposition of a high-level entity (or a graphical object) into a set of graphical primitives (e.g., lines and triangles) for further processing. Each primitive is decomposed into a series of fragments, with each fragment being a part of a primitive. Each fragment is further decomposed into a set of picture elements (or pixels) that can be displayed on the CRT. A fragment may, however, cover only part of a pixel. A more detailed description of the rendering process is provided in U.S. Pat. No. 5,594,854 entitled xe2x80x9cGRAPHICS SUBSYSTEM WITH COARSE SUBPIXEL CORRECTIONxe2x80x9d, issued Jan. 14, 1997 (hereinafter, the ""854 patent), and incorporated herein by reference. The rendering process is also described below.
The display of graphical objects typically requires intensive mathematical computation. For zooming or rotation, the objects in the image space are continually re-rendered. For 3-D graphics., the computational requirement is especially acute because of the additional computations required to transform a 3-D object into a 2-D image. Furthermore, the demand to produce more fully rendered 3-D images is even greater due to a higher user expectation for realism. Even with these intensive computational requirements, the rendering process needs to be performed in an expedient manner since slow rendering can cause the display of objects (i.e., during zooming or rotation) to appear unacceptably jerky. Thus, efficient rendering is essential in transforming graphical objects into high quality images.
To expedite the rendering process, a digital differential analyzer (DDA) is typically used to perform arithmetic computations. The DDA can be used, for example, to produce linear gradation (i.e., linear interpolation) of color, intensity, and other graphical information over an image area. For a primitive (e.g., polygon, triangle, or line), the DDA incrementally interpolates intermediate parameter values (e.g., shading values) at corresponding centers of pixels based on a start parameter value at a particular vertex and gradients of the parameter.
The operation of the DDA can usually be decomposed into three phases: (1) setup, (2) prepare-to-render, and (3) render. For conventional DDAs, these phases occur sequentially. Furthermore, the rendering process is typically performed and completed for a particular primitive before the next primitive is rendered.
The setup phase includes operations necessary to prepare the DDA. Typically, the DDA includes a set of registers that contain, for example, the start value and the gradient values. These values are typically loaded into the registers during the setup phase for each primitive being rendered. The prepare-to-render phase can be as simple as receiving a message to start the rendering process. Upon receiving the message, the render phase commences and the DDA renders the primitive.
The setup phase is an overhead of the rendering process and results in inefficiencies in the operation of the DDA. Ideally, the setup phase should consume no additional clock cycles. However, this is generally not true for conventional DDAs. Thus, the setup xe2x80x9ccostxe2x80x9d is normally amortized over the total number of clock cycles required to render a primitive.
A computer graphics system usually includes multiple DDAs, with each DDA assigned to a particular task. For example, a set of DDAs (e.g., one for each of the red, green, and blue colors) may be used to produce linear gradation of color over an image area. An additional DDA may be used to interpolate depth for a primitive of a 3-D object, to determine which portions of the primitive are actually visible from a synthetic camera""s point of view (i.e., visible surface determination).
To increase throughput in the rendering process, the DDAs in a computer graphics system can be operated in a pipeline structure. Pipelining is an implementation technique that improves throughput by overlapping the execution of multiple instructions. A pipelined graphics system is discussed in the aforementioned U.S. Pat. No. 5,594,854. However, the ""854 patent discusses pipelining at the subsystem level (i.e., concurrent operation of multiple DDAs). The DDAs of the ""854 patent operate in a conventional manner in that the setup, prepare-to-render, and render phases are performed sequentially for one primitive at a time.
As can be seen from the above, an improved DDA having reduced or no setup time and one that can concurrently process multiple fragments would improve the rendering process.
The invention provides a digital differential analyzer (DDA) with parallel processing paths. This DDA architecture provides efficient rendering of graphical images with minimal increase in hardware complexity. In particular, the inefficiency related to the setup phase of the rendering process could be eliminated or greatly reduced in some embodiments of the invention.
An embodiment of the invention provides parallel processing paths through the use of a pipeline that is implemented by double buffering. In thus embodiment, some of the input data registers (i.e., for the dPdx and dPdyDom parameters) are implemented with double buffers. Each double buffer includes an external register that corresponds to a setup path and an internal register that corresponds to a render path. While the rendering phase is being performed for the current primitive using the internal registers, the setup phase for the next primitive can be performed and the external registers can be updated. The two paths are synchronized with a prepare-to-render instruction.
A specific embodiment of the invention provides a DDA that includes at least one input buffer and a number of parallel processing paths. The input buffer receives at least one sequence of messages. The parallel processing paths couple to the input buffer and are capable of executing multiple messages substantially concurrently. At least one processing path includes an arithmetic unit.
Another specific embodiment of the invention provides a DDA that includes at least one input buffer, at least one output buffer, and an arithmetic unit. The input buffer receives input data values associated with an object, and the output buffer stores calculated output data values. The arithmetic unit operatively couples to the input and output buffers and computes the calculated output data values based on selective ones of the values from the input and output buffers. The DDA is capable of receiving input data values and calculating output data values substantially concurrently.
Yet another specific embodiment of the invention provides a digital differential analyzer (DDA) that includes a first and a second multiplexer, a first and a second register, and a first arithmetic unit. The first arithmetic unit couples to the first and second multiplexers and to the first and second registers. The first multiplexer further couples to the first and second registers. The first multiplexer further receives a starting value for the parameter P and the second multiplexer receives a set of gradient values for the parameter P. The first arithmetic unit computes a first result and a second result, with each result being based on a set of the values from the first and second multiplexers. The first result is stored in the first register and the second result is stored in the second register. The DDA can further include a second arithmetic unit that couples to the first arithmetic unit. The second arithmetic unit allows the DDA to concurrently process multiple fragments of an object.
Yet another specific embodiment of the invention provides a computer subsystem that includes a rasterizer and at least one DDA. The rasterizer generates one or more sequences of messages, with each message including an instruction and its associated data. The DDA couples to the rasterizer and receives the one or more sequences of messages. Each DDA includes parallel processing paths capable of executing multiple messages substantially concurrently. In one implementation, each DDA includes at least one input buffer to receive input data values associated with an object, at least one output buffer to store calculated output data values, and an arithmetic unit operatively coupled to the at least one input buffer and the at least one output buffer. The arithmetic unit provides the calculated output data values based on selected ones of the values from the at least one input buffer and the at least one output buffer.
Yet another specific embodiment of the invention provides a method for rendering graphical objects. The method includes: (1) receiving a high level description of the graphical objects; (2) transforming the high level description into a plurality of sequences of messages, with each message including an instruction and its associated data; (3) receiving setup information for a particular primitive of the object; and (4) rendering another particular primitive. The receiving and rendering are performed substantially concurrently within one DDA. The invention further provides a computer program product that implements the method described herein.
The foregoing, together with other aspects of this invention, will become more apparent when referring to the following specification, claims, and accompanying drawings.