The present invention relates to processing information received from a microprocessor, particularly to reordering out-of-order data by means of a table structure.
FIFO structures are used in computers for various functions such as buffering and pipelining. A FIFO structure is one in which the first object put into the structure is also the first object that must come out. A physical example is rolling marbles through a pipe. The first marble that goes into the pipe is also the first one that must come out the other end. Thus the pipe can be thought of as a FIFO structure.
Since the first computer display was attached to MIT""s Whirlwind computer in 1950, enormous advances have been made in the systems generating graphical pictures and in the display hardware which enables users of the system to view and interact with the pictures. Because the graphics display forms a large part of the physical user interface of the system, the evolution of display technology has been a major contributing factor in the growth of the computer industry.
Video graphics terminals, also known as graphics displays, show pictures and text. Familiar examples are computer monitors or television sets. Visually, one may think of a screen of the video display as constructed of many small xe2x80x9cdotsxe2x80x9d called pixels. The smallest object that can be shown on the video display screen is one pixel.
A modern computer monitor, for example, may have a rectangular screen 1,280 pixels wide by 1,024 pixels high. Therefore the screen would contain over one million pixels (1,024xc3x971,024). In a video terminal, each pixel generally requires storage or transmission of data about its properties, such as its color and brightness. For some computer monitors, a pixel""s properties may be stored in one byte. Because one byte usually allows only 256 color choices, other monitors a and graphics processors may use more than one byte of memory to store information about a pixel. In any event, more than one million bytes might have to be transmitted over a computer bus to update a 1280xc3x971024 computer screen one time. The computer screen might be updated thirty times each second if full-motion video is displayed. This means that at least thirty million bytes of pixel information might cross the computer bus every second for display of full-motion video.
Sending thirty million bytes of pixel information per second over the computer bus is not desirable because it ties up the bus. The computer cannot use the bus for other purposes while the pixel-bytes are being transmitted. For example, on a 66 MHz byte-wide bus, almost half the available transmission capability would be used. Further complicating the matter is the fact that each pixel-byte usually has xe2x80x9coverheadxe2x80x9d bytes transmitted along with it. The xe2x80x9coverheadxe2x80x9d bytes contain addressing information to make sure that the pixel-byte gets to the correct destination. The overhead bytes use even more of the bus transmission capability, leaving little or no room for the computer""s other communication needs.
One solution to the problem of these extra xe2x80x9coverheadxe2x80x9d bytes is to chain several related pixel-bytes together and transmit them in one transaction (known as a burst transaction). This is called write-combining because several individual bus writes have been combined into one bus write. The number of pixel-bytes is not reduced but the number of xe2x80x9coverheadxe2x80x9d bytes is reduced. A write-combine transmission may only require the same number of overhead bytes as a single pixel-byte transmission. As an example, currently some microprocessors may combine thirty-two pixel-bytes into one write-combine transmission. Thus thirty-two pixel-bytes are transmitted with approximately a ninety-seven percent reduction in xe2x80x9coverheadxe2x80x9d bytes.
The individual pixel-bytes are stored, one at a time, in a write-combine buffer. When certain conditions are satisfied, the contents of the buffer are evicted onto the computer bus. One feature of write-combine buffers, for example in the INTEL PENTIUM II architecture, is that if the size of the write-combine buffer is larger than the size of a discrete transfer on a bus, the order in which contents of the buffer are evicted to the bus is generally undefined. In essence, this means that the contents of the buffer are not necessarily put on the bus in the order in which they were written.
This re-ordering of the write-combine buffer contents generally does not matter when writing to memory, such as a frame buffer, because the final result will be the same. However, the re-ordering becomes important when writing to a FIFO buffer because the output of the FIFO must be used in sequence. In other words, when writing to an array of memory such as a frame buffer, the write order doesn""t necessarily matter because the memory may only be accessed after all the writes are finished. When writing to a FIFO, order matters because the current output must be used sequentially before the next one becomes available (returning to the pipe example, the marble showing at the end of the pipe must be removed before the next one can come out).
FIG. 2 displays a typical write-combine buffer, as implemented in an INTEL PENTIUM PRO processor. In the embodiment shown, a write-combining buffer 200 is comprised of a single line having a data portion 210, a tag portion 220 and a validity portion 230. The data portion 210 can store up to 32 bytes of user data. The validity portion 230 is used to store valid bits corresponding to each data byte of data portion 210. The valid bits indicate which of the bytes of data portion 210 contain useful data.
When a microprocessor writes to a location in a write-combine buffer that is already occupied, the contents of the buffer are evicted. Some eviction (aka flushing) schemes, such as employed by the INTEL PENTIUM PRO, allow for partial eviction of the write-combine buffer. For example, instead of evicting the contents of its entire 32 byte buffer, a microprocessor may only evict 8 bytes. What this means is that it is possible for writes to be evicted to the bus out-of-order. The evicted 8 bytes in the example above could xe2x80x9cjumpxe2x80x9d ahead of other contents of the write-combine buffer.
A frame buffer is memory that contains a digital representation of an image to be displayed on a monitor. A typical frame buffer will contain one byte of color information about each pixel in the monitor screen. A microprocessor writes the image data into the frame buffer, creating a virtual image. When the frame buffer is filled, the virtual image is output to the monitor through video circuitry to produce a viewed image on the monitor. Because the frame buffer is not used until it is full, it does not matter in what sequence the pixel color bytes are written to the frame buffer.
A common method for microprocessors to communicate with Input-Output (I/O) devices is memory-mapping. Essentially memory-mapped I/O means that certain areas of a microprocessor""s memory address space are reserved for communications with I/O devices. A video graphics card is one example of an I/O device that is generally memory-mapped. For the purpose of writing data, memory-mapping allows the microprocessor to treat the I/O device as if it were memory.
Originally, calculations needed to display graphics were handled exclusively by the microprocessor. As video graphics demands became greater, the microprocessor devoted a larger percentage of its time to handling graphics calculations. To ease this burden on the to microprocessor, a separate graphics processor is generally used to handle graphics calculations.
The graphics processor is often a memory-mapped device. When writing to a graphics processor, microprocessors typically xe2x80x9cseexe2x80x9d the graphics processor as frame buffer memory. This means that the microprocessor xe2x80x9cthinksxe2x80x9d that it is writing data to memory, not to a graphics processor, and strict sequential ordering is unimportant. In fact, it is actually writing data and commands to the graphics processor. If the sequence of commands to the graphics processor is not maintained, unpredictable behavior by the computer will result. Thus, order of writes to a graphics processor is very important.
As discussed above, write-combine buffers can evict data to the bus out of order. Without some method of reordering the data, a memory-mapped graphics processor is unable to take advantage of the benefits of microprocessor write-combining.
Write combining is a mechanism used by some CPUs to improve the speed at which they can transfer data to memory or another device. A write-combine transfer means that multiple writes have been combined to form a single write, so the transfer can be done more efficiently. In general, the mechanism implemented by the CPU combines all writes within an address range (typically 32 bytes), and any write outside this range (or other event) causes the combined write to be flushed. If the size of the write combining buffer is larger than the size of a discrete transfer on the bus, the order in which the contents of the buffer are flushed is generally undefined because partial writes are used to flush the buffer.
The re-ordering of data generally does not matter when writing to memory, such as a frame buffer, because the final result is the same. The order of writes does matter when writing to a buffer of a graphics processor, however, because the data may be commands that must be executed in a specific order by the graphics processor. A dynamic write-order organizer re-orders the data and commands written to a FIFO buffer so that they may be executed in the proper order.
Although a FIFO may be loaded by writing to a single address, it is common practice to use a base address and offset addresses for subsequent writes. This practice produces more efficient transfers on certain types of buses (e.g. PCI). In the preferred embodiment, the dynamic write-order organizer uses offset addressing because offset addresses are desirable as an indication of the ordering of the writes.
In the presently preferred embodiment, when the data is written to the FIFO, the offset address bits are stored in the FIFO alongside the data (the number of bits in the offset address depends on the size of the write-combine buffer). When data is read from the FIFO it is written directly into a table; the address alongside the data in the FIFO is used as the index into the table. Each entry in the table also has a flag to mark the validity of the entry. If the flag is valid (True) for the entry to be written to, the write stalls until the flag is cleared by the read process. When the write completes the flag is set to True.
In the presently preferred embodiment, a separate process At continually attempts to read from the table, starting at the first location (which corresponds to the base address +zero offset). The read is not allowed to happen until the valid flag is set True. When the first location is valid, it is read and the data passed on as though it had been read from the FIFO, and the flag cleared to False. The read index is o incremented and the valid flag tested again. This procedure is repeated until the end of the table has been reached and then starts again at the first entry.
Without a safety check, a programming error could cause this mechanism to lock-up. If the addresses used are not consecutive the read process will stall waiting for a write that will never arrive, and the write process will stall waiting for an entry to clear that will never be read. This condition is detected by testing the flags of all entries in the table between the entry being read and the entry where the write process is stalled trying to write. If there are any invalid flags between these two entries, a programming error has been detected and the table entries are reset.
The disclosed innovations, in various embodiments, provide one or more of at least the following advantages:
re-ordering data so that commands evicted from a write-control buffer may be executed in the order written by a microprocessor
a safety check to detect programming errors and information loss
a general method of reordering information written to a buffer for systems that use write-combining buffers
a reduction in bus traffic due to ability to use write-combining features of modern microprocessors for graphics operations