Personal computers (PCs) and other computer systems have a variety of controller integrated circuits (ICs) or chips. These controller chips control subsystems such as for graphics, the keyboard, hard, floppy, and optical disks, and general system logic such as memory bus. Controller chips are continually being improved to increase the performance and feature sets of computer subsystems.
Controller chips are usually programmable. For example, the graphics controller can be programmed with the display resolution, such as the number of pixels in a horizontal line, or the number of lines on a screen. Memory-controller chips can be programmed with numbers of clock cycles for memory accesses, so that the timing signals generated by the controller chip can be adjusted for faster memory chips or faster bus clocks.
When the computer is initialized or booted, lower-level software such as the BIOS or graphics drivers can program the controller chips by writing values into programmable registers on the controller chips. Users or higher-level programs can adjust features such as resolutions by writing different values to these registers. For example, a video game can change the resolution and color depth by writing to resolution registers in the graphics controller chip when the game program is started.
The microprocessor's address space is typically partitioned into memory and input/output (I/O) address spaces. While a large memory address space such as 4 GigaBytes (32 address bits) is provided, the I/O address space is typically much smaller, perhaps only 64 Kbytes (16 address bits). I/O addresses are used for accessing peripheral devices such as I/O ports, disk drives, modems, mouse and keyboard, and the controller chips. Often certain ranges of I/O addresses are reserved for certain types of peripherals, such as graphics, disks, and parallel ports. Thus the number of I/O addresses available to a peripheral controller chips is often limited.
Some of the programmable registers may be assigned addresses in the memory space rather than the I/O space. Since memory accesses are often faster than I/O accesses, memory-mapped registers can be accessed more quickly, improving performance. Frequently-accessed registers are often memory-mapped rather than I/O.
Programmable Registers--FIGS. 1, 2
FIG. 1 shows a computer system with a controller chip having programmable registers. A central processing unit (CPU) 12 is a microprocessor that executes instructions in a program stored in memory 14 or in a BIOS ROM (not shown). Display 16 is controlled by graphics controller 10. Programs executing on CPU 12 can update the information shown on display 16 by writing to a frame buffer inside or controlled by graphics controller 10. Graphics controller 10 reads lines of pixels from the frame buffer and transfers them to display 16, which can be a cathode-ray tube (CRT) monitor or a flat-panel display.
Bus 11 connects CPU 12 and graphics controller 10, and includes an address bus and a data bus. Bus 11 may be divided into separate sections by buffer chips. Often a high-speed bus such as a PCI (Peripheral Component Interconnect) or AGP (Accelerated Graphics Port) bus is used to connect to graphics controller 10.
Graphics controller 10 includes programmable registers 20 that control various features. For example, power-saving modes, display characteristics, timing, and shading can be controlled by CPU 12 writing to programmable registers 20. Registers are frequently written during 3D rendering or bitblt operations.
FIG. 2 highlights an address decoder that selects a data register for access. A shared address/data bus is used where the address is output during a first bus cycle while the data is output during a second bus cycle. During a first bus cycle, the CPU outputs an address on the bus to decoder 32. This address is decoded by decoder 32, causing selector 34 to selects one of the registers in programmable register 20 for access. The other programmable registers are deselected and cannot be accessed until a new address is written to decoder 32.
In the second bus cycle, the CPU writes a data value to the bus. The data written by the CPU is written through selector 34 to the register in programmable registers 20 that was selected by the address in decoder 32. The CPU may also read the selected register rather than write the selected register since selector 34 provides a bi-directional data path, depending on the read/write control signal from the CPU. For the PCI bus, address decoding takes 1, 2, or 3 clock cycles and data is written on the fourth clock cycle. A two-cycle idle time is necessary. Thus each PCI bus transaction requires 6 clock cycles.
The values written to programmable registers 20 are used to control features of the controller chip. For example, programmable registers 20 can output a number of pixels per horizontal line, and a number of lines in a screen, to counters 38 in a graphics controller. When the number of pixels written to the display matches the value of pixels/line from programmable registers 20, then a horizontal sync HSYNC pulse is generated. When the number of lines counted matches the total number of lines from programmable registers 20, then the vertical sync VSYNC is generated. Controls for windows within a screen can likewise come from programmable registers 20, such as for a movie window as described in "Transparent Blocking of CRT Refresh Fetches During Video Overlay Using Dummy Fetches", U.S. Pat. No. 5,754,170 by Ranganathan et al., and assigned to NeoMagic Corp.
FIG. 3 shows standard bus cycles to program registers. During the first bus cycle, a first address A1 is output on the bus from the CPU to the controller chip. Address A1 is the address of a first programmable register. In the second bus cycle, data D1 is output on the bus from the CPU to the controller chip. The controller chip stores data D1 from the bus into the programmable register for address A1.
A second data value is written to a second programmable register during the third and fourth bus cycles. Address A2 is output during the third bus cycle while data D2 is output during the fourth bus cycle. The controller chip writes data D2 to the register identified by address A2.
A third data value is written to another programmable register in the fifth and sixth bus cycles. Data D5 is written to the controller chip's register for address A5.
Each programmable register written requires a 2-bus-cycle access where the address is followed by the data. The programmable registers can be written in any order, but the correct address must precede the data value in each pair of bus cycles. Data may be read rather than written to the programmable registers by not asserting a write signal from the CPU.
Burst Access--FIGS. 4, 5
High-speed busses often support higher data bandwidth using a burst access. ring a burst-access cycle, the address input in the first bus cycle is followed by several data values input over several bus cycles. A predefined burst order is used to determine the addresses of the data values in the burst sequence.
FIG. 4 is a diagram of data being bursted into programmable registers. Burst decoder 33 receives a starting address A1 during a first bus cycle. Selector 34 routes the data to the A1 data register in programmable registers 20 having the starting address (A1) in the second bus cycle.
During the next 3 bus cycles, data values are received without addresses. The addresses of these three data values are implied by the burst rules. The burst rules define the address order during burst cycles. For purely sequential burst rules, the implied addresses of the next 3 data values are A1+1, A1+2, and A1+3. Often the burst addresses are interleaved so the addresses are somewhat mixed in order: A1+2, A1+1, then A1+3. The burst order is usually a fixed order defined by the architecture. Although a purely sequential burst is used as the example, other semi-sequential or interleaved burst orders may be substituted. The burst sequence is usually for sequential addresses (1,2,3,4), or semi-sequential addresses (1,3,2,4, or 1,4,2,3, or others) in some predefined sequence.
During the third bus cycle, burst decoder 33 causes selector 34 to route the second data value D2 to the next data register (A2) in programmable registers 20. Then in the fourth bus cycle, burst decoder 33 causes selector 34 to route the third data value D3 to the third data register (A3) in programmable registers 20. Finally, in the fifth bus cycle, burst decoder 33 causes selector 34 to route the fourth data value D4 to the fourth data register (A4) in programmable registers 20.
FIG. 5 is a timing diagram of a burst access of programmable registers. In the first bus cycle, address A1 is sent from the CPU to the controller chip. This is the starting address of the burst access, identify the first data register to be written. In the second bus cycle, data value D1 is sent to the controller chip and written into the A1 programmable register. Then in the third bus cycle, data value D2 is written to the A2 register. In the fourth bus cycle, data value D3 is written to the A3 register, while in the fifth bus cycle, data value D4 is written to the A4 register. The burst can stop after four data values are written, or continue with data value D5 being written to the A5 register.
Only the starting address A1 was written to the controller chip. The other addresses A2, A3, A4, A5 were not sent across the bus from the CPU to the controller chip. These addresses are implied by the burst rules.
Since only one address is sent for four or more data values, more of the bus bandwidth is used for data transfers than for address transfers. This improves the efficiency of the bus, allowing data to be written to the controller chip more quickly. Higher performance results.
The data values burst in must exactly follow the burst sequence defined by the burst rules. Data cannot be written out of order without stopping the burst and inputting a new address.
Non-Sequential Register Access Breaks Burst--FIGS. 6, 7
FIG. 6 shows that non-sequential programmable registers are sometimes accessed. Often programs or software drivers only need to update some of the programmable register while other programmable registers are not updated. In the example of FIG. 6, only registers A1, A2, A4, and A6 in programmable registers 20 need to be updated. Registers A3 and A5 do not need to be written.
Using a burst access would require that the intervening registers A3, A5 also be written. However, the current values of these registers might not be known, and thus additional read cycles are required to determine the current values to write back during a burst. Having to read these registers can negate the advantage of the burst, so standard cycles are commonly used. The address for each register is sent over the bus before each data value.
FIG. 7 is a timing diagram of writing to non-sequential programmable registers. Since registers A3, A5 are not being written, a burst access is not possible. Standard address-data cycles are used, and the data registers are programmed one at a time.
In the first and second bus cycles address A1 and data D1 are sent to the controller chip to program register A1 with data D1. A bus-idle period of 2 clock cycles follows. The bus-idle period is needed during bus mastering mode to allow time for bus recovery or arbitration. The previous device must stop driving the bus before the next device drives the bus to avoid bus conflicts.
Register A2 is programmed with data D2 in two more bus cycles, while register A4 is programmed with data D4 in another pair of bus cycles. Finally register A6 is programmed with data D6 in a last pair of bus cycles. A total of 8 bus cycles are needed. Also, bus-idle periods of about 2 clocks are needed between each pair of bus cycles. This increases the total time from 8 to 12 bus cycles.
If the four registers were the four in the sequential burst order, then only 5 bus cycles (8 PCI clocks) are needed. However, when the register are not in the exact sequential burst order, 8 or more bus cycles are needed.
While burst access is efficient, it is not always useful since registers are not always programmed in the sequential burst order. Sometimes only a relatively few registers are written. When even one register in the burst sequence is not written, then burst access may not be possible.
What is desired is burst access of programmable registers. It is desired to access programmable registers using burst access even when the registers being accessed are not in the burst-sequence order. It is desired to program only a subset of the registers in a burst sequence while still using efficient burst access cycles. A higher-speed method to access non-sequential programmable registers is desired.