A modem computer system typically comprises a central processing unit (CPU) and supporting hardware necessary to store, retrieve and transfer information, such as communications buses and memory. It also includes hardware necessary to communicate with the outside world, such as input/output controllers or storage controllers, and devices attached thereto such as keyboards, monitors, tape drives, disk drives, communication lines coupled to a network, etc. The CPU is the heart of the system. It executes the instructions which comprise a computer program and directs the operation of the other system components.
From the standpoint of the computer's hardware, most systems operate in fundamentally the same manner. Processors are capable of performing a limited set of very simple operations, such as arithmetic, logical comparisons, and movement of data from one location to another. But each operation is performed very quickly. Programs which direct a computer to perform massive numbers of these simple operations give the illusion that the computer is doing something sophisticated. What is perceived by the user as a new or improved capability of a computer system is made possible by performing essentially the same set of very simple operations, but doing it much faster. Therefore continuing improvements to computer systems require that these systems be made ever faster.
The overall speed of a computer system (also called the throughput) may be crudely measured as the number of operations performed per unit of time. Conceptually, the simplest of all possible improvements to system speed is to increase the clock speeds of the various components, and particularly the clock speed of the processor(s). E.g., if everything runs twice as fast but otherwise works in exactly the same manner, the system will perform a given task in half the time. Early computer processors, which were constructed from many discrete components, were susceptible to significant speed improvements by shrinking component size, reducing component number, and eventually, packaging the entire processor as an integrated circuit on a single chip. The reduced size made it possible to increase clock speed of the processor, and accordingly increase system speed.
Despite the enormous improvement in speed obtained from integrated circuitry, the demand for ever faster computer systems has continued. Hardware designers have been able to obtain still further improvements in speed by greater integration (i.e., increasing the number of circuits packed onto a single chip), by further reducing the size of circuits, and by various other techniques. However, designers can see that physical size reductions can not continue indefinitely, and there are limits to their ability to continue to increase clock speeds of processors. Attention has therefore been directed to other approaches for further improvements in overall speed of the computer system.
Without changing the clock speed, it is possible to improve system throughput by using multiple processors. The modest cost of individual processors packaged on integrated circuit chips has made this approach practical. However, one does not simply double a system's throughput by going from one processor to two. The introduction of multiple processors to a system creates numerous architectural problems. For example, the multiple processors will typically share the same main memory (although each processor may have its own cache). It is therefore necessary to devise mechanisms that avoid memory access conflicts, and assure that extra copies of data in caches are tracked in a coherent fashion. Furthermore, each processor puts additional demands on the other components of the system such as storage, I/O, memory, and particularly, the communications buses that connect various components. As more processors are introduced, there is greater likelihood that processors will spend significant time waiting for some resource being used by another processor.
All of these issues and more are known by system designers, and have been addressed in one form or another. While perfect solutions are not available, improvements in this field continue to be made.
Of particular interest herein is the design of communications buses. In simple computer systems, all major components such as processor, memory, storage controllers, and I/O are connected on a single multi-drop communications bus. Physically, such a multidrop bus is a common set of parallel conductors, and each component is connected to the these conductors through logic drivers or gates. The architecture of such a bus permits any arbitrary component connected to the bus to send data to any other arbitrary component connected to the bus, although it is not necessarily the case that all possible combinations are actually used. Since only one component may send data at any time, the component sending data must first obtain control of the bus, a process known as arbitration. The bus typically has an address portion for specifying the receiving device(s), and a data portion for specifying the data being transferred. It may also have various control lines.
The clock speed at which a multi-drop bus operates is limited by the number of attached devices, their physical configuration with respect to one another, the speed at which individual devices operate, and other factors. For this reason, many computer systems have multiple buses. In particular, processors and memory may be coupled to a relatively high-speed bus, while storage and I/O devices may be coupled to a slower bus. Since processors and memory typically require a higher speed, and are physically close enough to support higher speed bus operation, isolation of processors and memory from the lower speed devices such as storage and I/O by using a special processor-memory bus supports bus operation at higher speed and improves system performance.
However, the demand for increased system throughput continues. It is desirable to increase the number of processors in a computer system to increase system throughput. However, the high-speed multi-drop processor-memory bus was intended for a relatively small number of attached components. As the number of processors attached to such a bus increases, it becomes difficult or impossible to operate the bus at the higher clock speeds necessary to support communication among the various components. Moreover, the simple creation of wider buses or of additional (parallel) buses is not always a practical solution. Wider or additional buses means that each processor must have additional I/O pins, where the number of I/O pins is already extremely constrained.
Some designers have attempted to address this problem using hierarchical buses, in which each processor is assigned to a node, all processors within a node being on the same local bus coupled to a node controller, wherein the node controller handles communications with devices in other nodes through a separate remote bus. However, these designs require a great deal of complexity on the part of the node controller, with attendant cost and collateral issues.
There is a need for an alternative high-speed communication path architecture in a computer system for supporting communication among larger numbers of processors and memory.