This invention relates to graphics systems, and more particularly to addressing of programmable registers.
Personal computers (PCs) and other computer systems have a variety of controller integrated circuits (ICs) or chips that control subsystems such as for graphics, disks, and general system logic. Such controller chips are usually programmable. For example, the graphics controller can be programmed with the display resolution, such as the number of pixels in a horizontal line, or the number of lines on a screen. Memory-controller chips can be programmed with numbers of clock cycles for memory accesses, so that the timing signals generated by the controller chip can be adjusted for faster memory chips or faster bus clocks.
Advanced graphics systems often employ specialized engines, such as a bit-block-transfer BitBlt engine. Graphics data and commands can be written to a command first-in-first-out (FIFO) by a host processor, allowing the BitBlt engine to read and process graphics data and commands at its own pace.
The host microprocessor's address space is typically partitioned into memory and input/output (I/O) address spaces. While a large memory address space such as 4 GigaBytes (32 address bits) is provided, the I/O address space is typically much smaller, perhaps only 64 Kbytes (16 address bits). I/O addresses are used for accessing peripheral devices such as I/O ports, disk drives, modems, mouse and keyboard, and the controller chips. Often certain ranges of I/O addresses are reserved for certain types of peripherals, such as graphics, disks, and parallel ports. Thus the number of I/O addresses available to a peripheral controller chips is often limited.
Some of the programmable registers may be assigned addresses in the memory space rather than the I/O space. Since memory accesses are often faster than I/O accesses, memory-mapped registers can be accessed more quickly, improving performance. Frequently-accessed registers are often memory-mapped rather than I/O.
Programmable Registers FIGS. 1, 2
FIG. 1 shows a computer system with a controller chip with programmable registers. A central processing unit (CPU) 12 is a microprocessor that executes instructions in a program stored in memory 14 or in a BIOS ROM (not shown). Display 16 is controlled by graphics controller 10. Programs executing on CPU 12 can update the information shown on display 16 by writing to a frame buffer inside or controlled by graphics controller 10. Graphics controller 10 reads lines of pixels from the frame buffer and transfers them to display 16, which can be a cathode-ray tube (CRT) monitor or a flat-panel display.
Bus 11 connects CPU 12 and graphics controller 10, and includes an address bus and a data bus. Bus 11 may be divided into separate sections by buffer chips. Often a high-speed bus such as a PCI (Peripheral Component Interconnect) or AGP (Accelerated Graphics Port) bus is used to connect to graphics controller 10.
Graphics controller 10 includes programmable registers 20 that control various features. For example, power-saving modes, display characteristics, timing, and shading can be controlled by CPU 12 writing to programmable registers 20. Registers are frequently written during 3D rendering or bitblt operations.
FIG. 2 highlights an address decoder that selects a data register for access. A shared address/data bus is used where the address is output during a first bus cycle while the data is output during a second bus cycle. During a first bus cycle, the CPU outputs an address on the bus to decoder 31. This address is decoded by decoder 31, causing selector 34 to selects one of the registers in programmable register 20 for access. The other programmable registers are deselected and cannot be accessed until a new address is written to decoder 31.
In the second bus cycle, the CPU writes a data value to the bus. The data written by the CPU is written through selector 34 to the register in programmable registers 20 that was selected by the address in decoder 31. The CPU may also read the selected register rather than write the selected register since selector 34 provides a bi-directional data path, depending on the read/write control signal from the CPU. For the PCI bus, address decoding takes 1, 2, or 3 clock cycles and data is written on the fourth clock cycle. A two-cycle idle time is necessary. Thus each PCI bus transaction requires 6 clock cycles.
The values written to programmable registers 20 are used to control features of the controller chip. For example, programmable registers 20 can output a number of pixels per horizontal line, and a number of lines in a screen, to counters 38 in a graphics controller. When the number of pixels written to the display matches the value of pixels/line from programmable registers 20, then a horizontal sync HSYNC pulse is generated. When the number of lines counted matches the total number of lines from programmable registers 20, then the vertical sync VSYNC is generated. Controls for windows within a screen can likewise come from programmable registers 20, such as for a movie window as described in Transparent Blocking of CRT Refresh Fetches During Video Overlay Using Dummy Fetches, U.S. Pat. No. 5,754,170 by Ranganathan et al., and assigned to NeoMagic Corp.
FIG. 3 shows standard bus cycles to program registers. During the first bus cycle, a first address A1 is output on the bus from the CPU to the controller chip. Address A1 is the address of a first programmable register. In the second bus cycle, data D1 is output on the bus from the CPU to the controller chip. The controller chip stores data D1 from the bus into the programmable register for address A1.
A second data value is written to a second programmable register during the third and fourth bus cycles. Address A2 is output during the third bus cycle while data D2 is output during the fourth bus cycle. The controller chip writes data D2 to the register identified by address A2. A third data value is written to another programmable register in the fifth and sixth bus cycles. Data D5 is written to the controller chip's register for address A5.
Each programmable register written requires a 2-bus-cycle access where the address is followed by the data. The programmable registers can be written in any order, but the correct address must precede the data value in each pair of bus cycles. Data may be read rather than written to the programmable registers by not asserting a write signal from the CPU.
Burst Access FIGS. 4, 5
High-speed busses often support higher data bandwidth using a burst access, ring a burst-access cycle, the address input in the first bus cycle is followed by several data values input over several bus cycles. A predefined burst order is used to determine the addresses of the data values in the burst sequence.
FIG. 4 is a diagram of data being bursted into programmable registers. Burst decoder 33 receives a starting address A1 during a first bus cycle. Selector 34 routes the data to the A1 data register in programmable registers 20 having the starting address (A1) in the second bus cycle.
During the next 3 bus cycles, data values are received without addresses. The addresses of these three data values are implied by the burst rules. The burst rules define the address order during burst cycles. For purely sequential burst rules, the implied addresses of the next 3 data values are A1+1, A1+2, and A1+3. Often the burst addresses are interleaved so the addresses are somewhat mixed in order: A1+2, A1+1, then A1+3. The burst order is usually a fixed order defined by the architecture. Although a purely sequential burst is used as the example, other semi-sequential or interleaved burst orders may be substituted. The burst sequence is usually for sequential addresses (1,2,3,4), or semi-sequential addresses (1,3,2,4, or 1,4,2,3, or others) in some predefined sequence.
During the third bus cycle, burst decoder 33 causes selector 34 to route the second data value D2 to the next data register (A2) in programmable registers 20. Then in the fourth bus cycle, burst decoder 33 causes selector 34 to route the third data value D3 to the third data register (A3) in programmable registers 20. Finally, in the fifth bus cycle, burst decoder 33 causes selector 34 to route the fourth data value D4 to the fourth data register (A4) in programmable registers 20.
FIG. 5 is a timing diagram of a burst access of programmable registers. In the first bus cycle, address A1 is sent from the CPU to the controller chip. This is the starting address of the burst access, identify the first data register to be written. In the second bus cycle, data value D1 is sent to the controller chip and written into the A1 programmable register. Then in the third bus cycle, data value D2 is written to the A2 register. In the fourth bus cycle, data value D3 is written to the A3 register, while in the fifth bus cycle, data value D4 is written to the A4 register. The burst can stop after four data values are written, or continue with data value D5 being written to the A5 register.
Only the starting address A1 was written to the controller chip. The other addresses A2, A3, A4, A5 were not sent across the bus from the CPU to the controller chip. These addresses are implied by the burst rules.
Since only one address is sent for four or more data values, more of the bus bandwidth is used for data transfers than for address transfers. This improves the efficiency of the bus, allowing data to be written to the controller chip more quickly. Higher performance results.
The data values burst in must exactly follow the burst sequence defined by the burst rules. Data cannot be written out of order without stopping the burst and inputting a new address.
Non-Sequential Register Access Using Command FIFO FIGS. 6, 7
FIG. 6 shows that non-sequential programmable registers are sometimes accessed. Often programs or software drivers only need to update some of the programmable register while other programmable registers are not updated. Host 26 can write graphics commands and data to command FIFO 21. For each register in programmable registers 20 that is to be written, two entries are written to command FIFO 21. The first entry is an address of the programmable register, while the second entry is the data or command to be written to the programmable register.
For example, the first pair in command FIFO 21 is the pair or entries A1, D1. Data D1 is to be written to the register at address A1. In the example of FIG. 6, only registers A1, A2, A4, and A6 in programmable registers 20 need to be updated. Registers A3 and A5 do not need to be written. Host 26 can use burst cycles to fill command FIFO 21, but the graphics controller or BitBLt engine does not use burst cycles to write to programmable registers 20 from read command FIFO 21, since the registers written are out-of-sequence. Using a burst access to write programmable registers 20 would require that the intervening registers A3, A5 also be written.
FIG. 7 is a timing diagram of writing to non-sequential programmable registers from the command FIFO. Since registers A3, A5 are not being written, a burst access to write the registers is not possible. Standard address-data cycles are used, and the data registers are programmed one at a time.
In the first and second bus cycles address A1 and data D1 are sent to the controller chip to program register A1 with data D1. A bus-idle period may follow as shown in this example.
Register A2 is programmed with data D2 in the next bus cycles, while register A4 is programmed with data D4 in other bus cycles. Finally register A6 is programmed with data D6 in the last bus cycles.
While command FIFO 21 improves efficiency of host-to-register transfers, a large FIFO may be required. Since a register address is stored with each data entry, two entries in command FIFO 21 are needed for each register programmed. One address could be shared over many register accesses using a burst access if all registers in a sequence were accessed, but often registers are not programmed in the sequential burst order. Sometimes only a relatively few registers are written. When even one register in the burst sequence is not written, then burst access may not be possible.
What is desired is more efficient use of a command FIFO to access programmable registers. It is desired to access programmable registers through a command FIFO without storing separate addresses for each register. It is desired to access registers that are not in a sequential burst-sequence order. It is desired to program only a subset of the registers in a sequence while still sharing register address entries in the command FIFO. A more efficient method to access non-sequential programmable registers is desired.