In the latter half of the twentieth century, there began a phenomenon known as the information revolution. While the information revolution is a historical development broader in scope than any one event or machine, no single device has come to represent the information revolution more than the digital electronic computer. The development of computer systems has surely been a revolution. Each year, computer systems grow faster, store more data, and provide more applications to their users.
A modern computer system typically comprises a central processing unit (CPU) and supporting hardware necessary to store, retrieve and transfer information, such as communications buses and memory. It also includes hardware necessary to communicate with the outside world, such as input/output controllers or storage controllers, and devices attached thereto such as keyboards, monitors, tape drives, disk drives, communication lines coupled to a network, etc. The CPU is the heart of the system. It executes the instructions which comprise a computer program and directs the operation of the other system components.
From the standpoint of the computer's hardware, most systems operate in fundamentally the same manner. Processors are capable of performing a limited set of very simple operations, such as arithmetic, logical comparisons, and movement of data from one location to another. But each operation is performed very quickly. Programs which direct a computer to perform massive numbers of these simple operations give the illusion that the computer is doing something sophisticated. What is perceived by the user as a new or improved capability of a computer system is made possible by performing essentially the same set of very simple operations, but doing it much faster. Therefore continuing improvements to computer systems require that these systems be made ever faster.
The overall speed of a computer system (also called the “throughput”) may be crudely measured as the number of operations performed per unit of time. Conceptually, the simplest of all possible improvements to system speed is to increase the clock speeds of the various components, and particularly the clock speed of the processor. E.g., if everything runs twice as fast but otherwise works in exactly the same manner, the system will perform a given task in half the time. Early computer processors, which were constructed from many discrete components, were susceptible to significant clock speed improvements by shrinking and combining components, eventually packaging the entire processor as an integrated circuit on a single chip. The reduced size made it possible to increase the clock speed of the processor, and accordingly increase system speed.
Many design improvements in addition to clock speed have increased the throughput of computer systems, but the demand for ever faster clock speeds remains.
The clock speed selected for a particular processor design can be no faster than the slowest operation to be performed in a single clock cycle. This in turn is limited by logic circuit gate delays and transmission path delays. Many earlier processor designs were capable of executing a complete simple instruction of the processor's instruction set within one clock cycle, although complex instructions often required multiple cycles. Even a simple instruction requires a substantial number of gate delays for sequentially decoding, moving data, performing logical operations, and so forth. These gate delays limited the clock speeds of such processor designs. In order to support higher clock speeds, most modern processors use some form of pipelining for executing instructions. A pipeline breaks down an instruction into multiple sequential sub-parts, or stages. With each clock cycle, an instruction proceeds to the next stage of the pipeline. By thus breaking each instruction into multiple stages, the number of things which are done at each stage is reduced, meaning that the number of sequential gate delays of logic required for each stage is less than required for a complete instruction. A pipelined design therefore supports higher clock speeds by reducing the number of gate delays which must be accommodated in a clock cycle, although at a cost of additional hardware complexity.
While pipelining has substantially reduced the number of logic gate delays in each clock cycle, another major limitation on processor clock speed which has assumed a greater significance is the propagation delay inherent in the physical size and layout of processor chips. Typical modern clock speeds are so fast that it becomes difficult to propagate a signal from one part of the processor chip to a relatively distant part within a single clock cycle. If careful attention is paid to the layout, it may be possible to avoid many long signal paths, but it is unlikely that all long paths can be eliminated by good design. Layout becomes increasingly difficult as clocks speeds increase and processors become more complex. It may be necessary to accept that some signals will require multiple cycles to propagate within the chip. But if this concession is made routinely, the benefit of faster clock speeds is largely lost.
Among the critical paths involved in processing data are the retrieval of data from registers within the processor. The very purpose of registers is to hold data temporarily in a location where it can be retrieved with the highest speed. In most processor designs, this means register data is accessible in a single clock cycle. However, as processor designs become more complex, and include larger register files, the physical distance between registers and certain functional logic is difficult to maintain within a single clock cycle. Support for hardware multithreading, which typically means that the processor contains multiple program sets of registers, each supporting a respective thread, further increases the required size of register files. At the same time, increasing clock speeds provide less time to propagate data from the registers to the functional logic.
It would, of course, be possible to allow multiple clock cycles for register access, but since register access form such a critical part of the functions performed by the processor, this is likely to significantly affect processor performance, and would defeat the purpose of faster clock speeds. As the number of pipeline stages increases, more registers are required to hold intermediate results, further defeating efforts to improve clock speed.
As the demand for ever faster and more capable processors grows, it is likely that the challenges of intra-processor signal propagation, and in particular signal propagation involving register access, will increase. It is therefore desirable to find improved processor design techniques which will support increased clock speeds as well as larger and more complex processors.