The end result of most graphics applications is the generation of an image on a display or piece of paper. The basic elements of the image are the individual pixels and the location of each pixel. Thus, the basic values manipulated by a processing system are representations of the pixels and their locations, or addresses.
A typical representation of a pixel for a computer display uses RGB (red, green, blue) format. In this format, multiple bits for each of the R, G and B values indicates the intensity of the red, green and blue guns on the CRT. The combination of the different intensities gives the desired colors. In an alternate format, rather than specifying the intensity of each gun directly, the values could be used to index into a color look-up table (LUT) which then provides the desired intensity value for each gun of a CRT. One typical format uses 8 bits for each of the R, G and B values. In some formats, another field, called alpha (a) is used to represent either the transparency or relative coverage of an object over the pixel for 3-D applications. In 3-D applications, depending on the viewpoint, which pixel at of a particular X, Y location is to be displayed must be chosen amongst a number of planes of pixels in the Z plane. If a pixel in front is relatively transparent, the pixel behind might be allowed to show through, giving some combination of the colors of the two pixels.
Alternate types of pixel representations are used. For instance, CYM (cyan light blue!, magenta purplish-red!, and yellow) are typically used for printers, with a subtractive operation between the intensities, rather than additive as for RGB. Yet another representation, YUV, is used to give a luminance value and two values of chroma, and was developed for broadcast television to be compatible with black and white TVs which only have a luminance value. Black and white monitors use a grey scale, which is specified by the intensity of each pixel.
The display of an image in graphics processing requires producing the appropriate pixel values for each pixel at each address in a frame buffer, which stores information to be displayed on the screen. The pixel values at each location in the frame buffer may be created by various rendering techniques, which are the operations for creating the image in the frame buffer. Rendering will typically use a number of primitives, which are basic building blocks of the pictures, such as points, lines, polygons, circles, etc.
For 3-D images, additional factors include the particular viewpoint used, with appropriate lighting effects, such as shading and reflection. In addition, particular objects may be colored or textured to make them more realistic. The lighting and shading effects are accomplished by varying the intensities of the individual pixels in the area to be shaded or have a reflection. This typically involves, at the processing level, the multiplication of pixel values by a constant. This is one example of a graphics intensive operation in which many pixels may be required to be multiplied by the same or varying constants.
In 3-D graphics, an image is generated in three dimensions. This is stored in memory as an X, Y plane representing the first face of a cube, with a series of slices through the cube representing different Z positions, each having its own X, Y plane representation. When viewing the image on the display, the viewpoint must be selected to determine which X, Y plane is visible at any particular viewpoint. This typically involves a comparison operation in which the pixels' Z positions are compared to each other to determine which to put in the frame buffer for display. In addition, the pixel position may be compared to the viewpoint z position. Thus, extensive comparison operations are another attribute of graphics manipulation.
In addition to determining which image is in front for 3-D applications, images may be clipped for a variety of reasons. For instance, if a triangle is partially behind a square, it may be most efficient to render the triangle, then render the square, and clip the triangle where it is hidden by the square. In addition, windows may be generated, or the edge of a screen may vary depending upon the scale of an image to be displayed, thus creating additional boundaries where an image must be cut off. Thus, it is often necessary to compare a particular pixel position, or address, to an edge boundary or clipping position.
In video graphics, there are additional complications for processing images. In particular, a large number of different images must be generated very rapidly, thus requiring fast throughput plus large amounts of memory. Compression techniques are thus very important in eliminating the amount of memory. One such compression technique involves motion estimation. In movement from one frame to another, often most of the image will not change, with only a portion moving. The portion moving, or the whole if it is moving, often will typically be shifted, and thus storage can be saved by indicating the amount of shift rather storing a whole new image. The amount of movement is typically determined by comparing a block of pixels in one image to those in another image frame, and moving the positions of the blocks around relative to each other until a best match is obtained.
Other considerations in video include the need to deal with images in multiple formats, and the need to convert one format into another to make it compatible with other images.
There are three major barriers to achieving high performance in graphics computer systems. The first barrier is in floating point processing throughput. Graphics applications typically perform large amount of figure manipulation operations such as transformations and clippings using floating point data. The second barrier is in integer or fixed point processing throughput. Graphics applications also typically perform large amount of display operations such as scan conversion and color interpolation using integer or fixed point data. The third barrier is in memory references. The above-described operations typically require large amount of memory references for reading from and writing into, for example, the frame and Z-buffers.
Historically, the CPU's in early prior art computer systems are responsible for both graphics as well as non-graphics functions. No special hardware are provided to assist these early CPUs in performing the large amount of floating and fixed point processing, nor memory references. While the designs of these early prior art computer systems are simple, their performance are typically slow.
Some later prior art computer systems provide auxiliary display processors. The auxiliary display processors would off load these later CPUs from some of the display related operations. However, these later CPUs would still be responsible for most of the graphics processing. Typically, the bandwidth of the system buses of these later prior art computer systems are increased correspondingly to accommodate the increased amount of communications between the processors over the buses. The auxiliary display processors may even be provided with their own memory to reduce the amount of memory contentions between the processors. While generally performance will increase, however, the approach is costly and complex and may not be scalable.
Other later prior art computer systems would provide auxiliary graphics processors with even richer graphics processors would off load the CPUs of these later prior art computer systems from most of the graphics processing. Under this approach extensive dedicated hardware as well as sophisticated software interface between the CPUs and the auxiliary graphics processors will have to be provided. While performance will increase even more, however, the approach is even more costly and more complex than the display processor approach.
In the case of microprocessors, as the technology continues to allow more and more circuitry to be packaged in a small area, it is increasingly more desirable to integrate the general purpose CPU with built-in graphics capabilities instead. Some modern prior art computer systems have begun to do that. However, the amount and nature of graphics functions integrated in these modern prior art computer systems typically are still very limited. Particular graphics functions known to have been integrated include frame buffer checks, add with pixel merge, and add with Z-buffer merge. Much of the graphics processing on these modern prior art systems remain being processed by the general purpose CPU without additional built-in graphics capabilities, or by the auxiliary display/graphics processors.
The performance of a CPU in doing graphics operations may be affected by the structure of the CPU itself. For instance, most modern CPUs employ a cache memory and a TLB (translation look-aside buffer). The cache memory is a small memory storing frequently accessed instructions or data by a computer program. This is based on the realization that many applications programs do loops or repeatedly access data which is in proximity to each other. Thus a speed savings can be achieved by having a small amount of data and instructions on the microprocessor chip itself or an external, dedicated cache which is more quickly accessed than main memory. However, in graphics functions, the cache size may be overwhelmed in processing a large image. The TLB is a small cache of page translations from a virtual address used by a program to a physical address in memory, and misses may occur more often for graphics operations because of the amount of data that needs to be addressed.
In RISC (reduced instruction set computing) processors a superscalar approach is used in which multiple, relatively simple, instructions are executed in parallel. This requires a number of parallel execution units for performing these instructions. In addition, these processors are typically pipelined, with each instruction entering the pipeline to be followed by another instruction, and thus multiple instructions are being processed in the pipeline at the same time. Accordingly, the design of the processor requires that the execution units and pipelines be constructed so that it is unlikely that any two sequential instructions will require the same execution unit, thus not allowing them to be issued in parallel.
One implementation of a RISC microprocessor incorporating graphics capabilities is the Motorola MC88110. This microprocessor, in addition to its integer execution units, and multiply, divide and floating point add units, adds two special purpose graphics units. The added graphics units are a pixel add execution unit, and a pixel pack execution unit. The Motorola processor allows multiple pixels to be packed into a 64-bit data path used for other functions in the other execution units. Thus, multiple pixels can be operated on at one time. The packing operation in the packing execution unit packs the pixels into the 64-bit format. The pixel add operation allows the adding or subtracting of pixel values from each other, with multiple pixels being subtracted at one time in a 64-bit field. This requires disabling the carry normally generated in the adder on each 8-bit boundary. The Motorola processor also provides for pixel multiply operations which are done using a normal multiply unit, with the pixels being placed into a field with zeros in the high order bits, so that the multiplication result will not spill over into the next pixel value representation.
The Intel I860 microprocessor incorporated a graphics unit which allowed it to execute Z-buffer graphics instructions. These are basically the multiple operations required to determine which pixel should be in front of the others in a 3-D display.
The present invention addresses two areas where complications arise in providing native graphics capability in a CPU. One area is the computation intensive operation of determining which images in a 3-D image are to be displayed on the screen. This is done on a pixel by pixel basis by comparing which pixel is in front of the other from the viewpoint selected, and writing that pixel which is in front to the frame buffer for display. This computation involves loading the Z or depth value of each pixel and comparing the depth values of two pixels to each other. If the depth value of the current pixel is less than the next pixel (i.e., this pixel is in front of the other pixel), a branch is made to write the current pixel to the frame buffer (and its depth to the Z-buffer), and a return is made to the comparison sequence. Such a computation is especially time consuming since the data representing pixel depth have a limited number of bits, and thus the large data paths of modern microprocessors are largely wasted on a single pixel.
An additional complication with graphics capability in a general purpose CPU is the proliferation of opcodes. Typically, a fixed bit field is established for all the opcodes for a microprocessor. As more and more specific graphics operations are added, the number of available opcodes is rapidly used up, and the decode logic for the opcode becomes increasingly complex. Accordingly, it is desirable to minimize the number of opcodes with such additional capability, which is in conformance with the philosophy behind superscalar architecture.