Computer systems such as personal computers (PCs) have a variety of controller integrated circuits (ICs) or chips. These controller chips control subsystems such as for graphics, the keyboard, hard, floppy, and optical disks, and general system logic such as memory bus. Controller chips are continually being improved to increase the performance and feature sets of computer subsystems.
Controller chips are often programmable. For example, the graphics controller can be programmed with the display resolution, such as the number of pixels in a horizontal line, or the number of lines on a screen. Memory-controller chips can be programmed with numbers of clock cycles for memory accesses, so that the timing signals generated by the controller chip can be adjusted for faster memory chips or faster bus clocks.
When the computer is initialized or booted, lower-level software such as the BIOS or graphics drivers can program the controller chips by writing values into programmable registers on the controller chips. Users or higher-level programs can adjust features such as resolutions by writing different values to these registers. For example, a video game can change the resolution and color depth by writing to resolution registers in the graphics controller chip when the game program is started.
The microprocessor's address space is typically partitioned into memory and input/output (I/O) address spaces. While a large memory address space such as 4 GigaBytes (32 address bits) is provided, the I/O address space is typically much smaller, perhaps only 64 Kbytes (16 address bits). I/O addresses are used for accessing peripheral devices such as I/O ports, disk drives, modems, mouse and keyboard, and the controller chips. Often certain ranges of I/O addresses are reserved for certain types of peripherals, such as graphics, disks, and parallel ports. Thus the number of I/O addresses available to a peripheral controller chips is often limited.
As new features are added to a controller chip, additional registers are needed to control these features. The number of registers needed can quickly exceed the number of discrete addresses in the peripheral's I/O range. I/O addresses are often shared among many registers by using an indexing scheme.
Many registers on a controller chip can be accessed through a pair of I/O addresses using an index register. Registers are accessed in a two-step process. First the microprocessor writes the register's index into a first I/O address, the index register. The controller chip reads the register's index and then couples that register to the chip's input and output. In the second cycle, the microprocessor reads or writes a second I/O address, the data register. The data read or written to the data register is coupled to the register selected by the index written in the first cycle. Other registers on the controller chip can be read or written by writing a different value for the index in the first cycle.
Each access of an indexed register on the controller chip thus takes two cycles rather than just one. First the index is written, then the register identified by the index is accessed. While this slows performance, often the index registers are only infrequently used and thus overall system performance is not hampered. Some registers that are more-frequently accessed can be assigned their own I/O addresses, while other less-frequently-accessed registers can be indexed, sharing the index and data I/O addresses.
Indexed Registers--FIG. 1
FIG. 1 shows a computer system with a controller chip having indexed registers. A central processing unit (CPU) 12 is a microprocessor that executes instructions in a program stored in memory 14 or in a BIOS ROM (not shown). Display 16 is controlled by graphics controller 10. Programs executing on CPU 12 can update the information shown on display 16 by writing to a frame buffer inside or controlled by graphics controller 10. Graphics controller 10 reads lines of pixels from the frame buffer and transfers them to display 16, which can be a cathode-ray tube (CRT) monitor or a flat-panel display. Bus 11 connects CPU 12 and graphics controller 10, and includes an address bus and a data bus. Bus 11 may be divided into separate sections by buffer chips.
Graphics controller 10 includes indexed registers 20 that control various features. For example, power-saving modes, display characteristics, timing, and shading can be controlled by CPU 12 writing to indexed registers 20.
FIG. 2 highlights an index register that selects a data register for access. During a first access cycle, the CPU writes an index to index register 32. This index is decoded by selector 34, which selects one of the registers in indexed register 20 for access. The other indexed registers are deselected and cannot be accessed until a new index is written to index register 32.
In the second CPU access cycle, the CPU writes a data value to a second I/O address. The data written by the CPU is written through selector 34 to the register in indexed registers 20 that was selected by the index in index register 32. The CPU may also read the selected register rather than write the selected register since selector 34 provides a bi-directional data path, depending on the read/write control signal from the CPU.
The values written to indexed registers 20 are used to control features of the controller chip. For example, indexed registers 20 can output a number of pixels per horizontal line, and a number of lines in a screen, to counters 38 in a graphics controller. When the number of pixels written to the display matches the value of pixels/line from indexed registers 20, then a horizontal sync HSYNC pulse is generated. When the number of lines counted matches the total number of lines from indexed registers 20, then the vertical sync VSYNC is generated. Controls for windows within a screen can likewise come from indexed registers 20, such as for a movie window as described in "Transparent Blocking of CRT Refresh Fetches During Video Overlay Using Dummy Fetches", U.S. Pat. No. 5,754,170 by Ranganathan et al., and assigned to NeoMagic Corp.
FIG. 3 shows an index-register decoder in a controller chip. I/O address or port 22 (hexadecimal, or hex) is used as the index register, while I/O address 23 (hex) is used for the data register. The CPU first writes the index of the desired register to the index register by executing an I/O output instruction with an address of 22 and the index as the data. Comparator 19 detects that the address matches 22, the index register, which is ANDed by gate 17 with a strobe generated by logic 21 when the access cycle is for an I/O address rather than a memory address. In the second access cycle, the CPU writes to the data register's port, address 23. Comparator 18 detects address 23, and outputs a one to AND gate 15. When a strobe is generated by logic 21, AND gate 15 pulses its output REG_ACC, which strobes the data into the selected index register.
FIG. 4 illustrates in an abstract way how an index register and a data register are used to access indexed registers. Index register 32 is accessed when the CPU writes to I/O address 22, while data register 36 is accessed when the CPU accessed I/O address 23. Data register 36 does not have to be a physical register, since it merely appears to the CPU to be an I/O port.
During a first cycle, the CPU executes the output instruction out(indx_reg, indx), which sends the address "indx_reg" (22 hex) out on the address bus, and sends the index "indx" as the data over the data bus. The address "indx_reg" selects index register 32, while the data "indx" is written into index register 32. This index is used to select one of the registers in indexed registers 20 for access, coupling the selected register to data register 36.
In the second access cycle, the CPU writes to data register 36 using the output instruction out(data_greg, data). The address "data_reg" (23 hex) is output on the address bus, while the data "data" is output to the data bus. This data is written into data register 36 and transferred to the selected register in indexed registers 20.
A read rather than a write can be performed in the second cycle using an input instruction. The input instruction in(data_reg) sends the address "data_reg" onto the data bus, which causes data register 36 to be accessed. Control logic detects that the read signal is activated, and so the data from data register 36 is output by the controller chip to the data bus so the data can be read by the CPU. Data is first transferred from the register selected by the index in index register 32 from indexed registers 20 to data register 36.
The indexed register are typically byte-wide (8-bit) registers, with each byte register having a different index. The second access cycle does not have to immediately follow the first access cycle, since the controller-chip index-register-access logic is not affected by memory cycles or I/O cycles to other ports.
Multi-Threaded Operation--FIG. 5
FIG. 5 highlights the problem of index overwriting by multiple threads. Today's more complex PC's use multi-threaded operating systems that can execute two or more programs independent of each other. The threads typically share the same CPU, or can execute on separate CPUs. The relative timing of instructions executed by one program in one thread can vary substantially compared to instructions executed by another thread.
When both threads access the same set of indexed registers, problems can occur. For example, both threads may be accessing graphics registers on a graphics controller chip. The threads could be reading indexed registers to determine what graphics modes and features are currently operating. Even when both threads only read indexed registers, problems can occur since the index must be written before any indexed register can be read.
Thread A reads data from an indexed register at index "indx1", while thread B is reading data from an indexed register at another index, index "indx2". Both threads are writing the index to port "indx_reg" and reading from port "data_reg". Thread A first writes index "indx1" to the index register with instruction output.sub.-- 8(indx_reg, indx1), which is an 8-bit write operation. The next instruction in thread A, its second cycle, reads the data register: instruction data1=input.sub.-- 8(data_reg) which reads the data register and stores it as variable "data1".
Before thread A can execute the second instruction to read the data register, the CPU interrupts thread A and continues execution of thread B. When multiple CPUs are used, the same effect occurs when thread A loses arbitration of the I/O bus to thread B's CPU.
Thread B is at a point where another indexed register is to be read. Thread B first writes the index of the register thread B needs to read, "indx2" to the index register with the instruction output.sub.-- 8(indx_reg, indx2). The new index "indx2" is different from thread A's index "indx1", and thus over-writes the index register with thread B's index. Thread A is not notified by thread B that its index has been erased. Thread B's next instruction, its second cycle, reads the data register: instruction data2=input.sub.-- 8(data_reg). This reads the data register and stores it as variable "data2". Since thread B just wrote its index to the index register, thread B reads the intended register.
When thread A resumes, it executes the second cycle, reading the data register. Instruction data1=input.sub.-- 8(data_reg) reads the data register and stores it as variable "data1". However, the index used is "indx2" that was just written by thread B, not "indx1" that thread A wrote. Thus the data read is actually "data2" from index "indx2", the same data that thread B read. Thread A can later crash since it read the wrong graphics information and may make decisions about the graphics format that are not compatible with the actual graphics mode. Similar problems can occur with systems logic chips, keyboard controllers, and disk controllers.
A solution for writing indexed registers is to always write both registers at the same time, using a 16-bit write cycle. Since the data register is usually the next byte above the index register, a 16-bit write cycle writes both the index and the data at the same time. The instruction: Output.sub.-- 16(index_reg, (data&lt;&lt;8)+index) shifts the data byte up by 8 bits to the upper byte, while the index occupies the lower byte. While this 16-bit operation is useful for writing data, it is not useful for reading the data. The index must be written while the data is read. A low-level atomic instruction can only read or write, but not both.
Restoring Index Not Always Effective--FIG. 6
FIG. 6 highlights the failure of a common solution for multi-threaded register reading. Each thread can include additional instructions to save the original index from the index register and then restore the original index to the index register after the thread has read another index register Thus the thread restores the original index. This is effective when the thread is not interrupted during the save and restore sequence. However, if the thread is interrupted before it can restore the original index, failures can occur.
Thread A first reads and saves the original index in the index register with the instruction orig_indx=input.sub.-- 8(indx_reg). Then thread A writes the index of the register it desires to read, "indx1", with an output.sub.-- 8 instruction.
Before thread A can read the data register, thread B executes instructions that read and write the same index register. Thread B stores the original index as orig_indx2, which is the index just written by thread A, "indx1". Then thread B writes the index of the register thread B desires to read, "indx2", with an output.sub.-- 8 instruction.
However, before thread B can read the data register and restore the original index, thread A regains control of the I/O bus and continues execution. Thread A reads the data register with the instruction data1=input.sub.-- 8(data_reg). Unfortunately, the register selected by the controller chip is the indx2 register, not the indx1 register, since the index register currently contains indx2. The desired index, indx1, was over-written by thread B.
When thread A restores the original index, it restores orig_indx1, which was the index before thread B changed the index. When thread B resumes execution, the wrong index is in the index register, because thread B wants to read from index indx2, not from orig_indx1. Thus both thread A and thread B read the wrong index register and the wrong data.
If thread B were allowed to continue and read the data register and restore the original index, then no failure would have occurred since thread A's index would have been restored. However, in a completely arbitrary multi-tasking system, the exact timing of thread A and thread B cannot be guaranteed. Thus, while unlikely, the failure could occur. When both threads are spawned from the same event, such as from execution of a single multi-threaded program, the failure may be repeatable. Otherwise, such a failure can be extremely difficult to detect and capture in a laboratory.
Even when the situation illustrated in FIG. 6 does not occur, the additional saving and restoring of the original index is wasteful. I/O reads and writes of index registers are relatively slow, requiring perhaps hundreds of clock cycles on fast PCs. For example, 10 to 15 PCI cycles are needed for each read or write. For a 33 MHz PCI bus and a 400 MHz CPU, this translates to 360 CPU clock cycles. Thus it is desirable to avoid adding the save and restore instructions.
What is desired is a method to read and write index registers that is immune to multitasking interruption. It is desired to allow multiple threads to simultaneously read and write indexed registers without notifying each other. It is desired to allow threads to read and write indexed registers in a manner that is completely asynchronous to each other. Multi-tasking index-register access is desired that does not unnecessarily add slow index-register read and write operations. Higher-performance index-register accessing for multi-tasking is desired.