This invention relates to digital computers, and more particularly relates to bus arbitration in computer systems which employ a plurality of processing units.
A digital computer consists of a set of functional components of at least three basic types: memory units, input/output (I/O) devices, and central processing units (CPUs). Memory modules typically consist of N words of M bits. Each word is assigned a unique numerical address (0, 1, . . . N-1). Memory words can be written or read by other modules in the system. I/O modules are functionally similar to memory modules, in that they can support both read and write operations. Disk drives, tape drives, printers, display screens, and modems are all examples of I/O devices which are controlled by an I/O module. Central processing units are typically the focal point of activity in a computer system. A CPU reads instructions and data from memory and I/O modules, performs logical or numerical manipulations on data as directed by instructions, and writes data to memory and I/O modules after processing. In addition, CPU modules may be used to control and synchronize the overall system operation, and can receive interrupt signals from other system components. As with memory and I/O modules, a computer system may include one or more independent CPU modules.
Practical and useful operation of a computer involves the communication of data and synchronization/control signals between all of the various components which comprise it. The collection of paths which logically connect the functional units of a computer together and enable the communication of information between them is called an interconnection structure.
An interconnection structure should be able to support information transfers which are included in any of the following categories:
Memory to CPU transfers: A CPU reads an instruction or data from memory
CPU to Memory transfers: A CPU writes data to memory
I/O device to CPU transfers: A CPU reads data from an I/O device
CPU to I/O device transfers: A CPU sends data to an I/O device
I/O device transfer to or from Memory: An I/O device exchanges data directly with memory, without going through the CPU (called Direct Memory Access or DMA)
In addition to supporting these transfers, the interconnection structure is commonly employed to carry control signals used to initiate the above transfers by the various functional units, and also to provide electrical power to the units.
The specific organization of an interconnection structure is a fundamental consideration in the design of computer systems, and a number of different implementation strategies exist in the art. The decision to employ a particular design approach depends on several factors, including the intended application of the system, the number and type of functional units that can be included in the system, and the performance characteristics desired of the system.
One of the most common interconnection strategies used in the design of both single and multiple processor computers is a "shared bus" architecture, in which the functional units of the system are connected by means of a common collection of conductive lines called a bus. In such an organization, only one module at a time can exert control over the use of a shared bus, and contention among the units which require the use of a bus must be resolved in some manner. With a centralized method of arbitration, a single hardware device, referred to as a bus controller, is responsible for allocating time on the bus among all units which may wish to use it. Alternatively, the synchronization and control logic associated with the use of a bus can be distributed equally among the various units which are interfaced to the bus, so that bus controller status can be transferred between many of the units in the system.
Whether a bus is implemented with a central control unit or with a distributed control arrangement, the implementation of a bus in a computer system can be additionally characterized by a number of other features. Of primary concern is the arbitration policy employed, which determines how access to the bus is obtained, particularly in the case that two or more units interfaced to the bus request access to the bus at once. An important feature of bus implementation schemes related to the arbitration policy is the synchronization mechanism, which defines how units interfaced to the bus request access to the bus. Another characteristic of a bus is its definition of information transfer protocols--the standardization of information transfer sequencing and formatting, so that the various units interfaced to the bus can appropriately interpret a variety of information carried on the bus lines. The definition of transfer protocols is closely related to the number and designation of the physical conductive lines which comprise a bus.
Many of the various techniques of bus arbitration known in the art can be broadly classified according to whether they employ a fixed or dynamic priority assignment among the units interfaced to a bus. In a fixed priority policy of arbitration, each unit that will participate in bus arbitration is assigned a certain priority level at start-up or configuration time. Whenever two or more units are in contention for use of the shared bus, access is unconditionally granted to the unit having the highest fixed level of priority. This approach is often implemented using a scheme called "daisy chaining," in which all units are assigned static priorities according to their locations along a common bus request line. Any unit which requires the bus asserts a request on the common request line. The bus arbitration mechanism polls each unit on the bus in order of priority until encountering a device which has requested use of the bus, and that device is granted access to the bus.
Although simple to implement, fixed priority arbitration schemes are often considered unacceptable, especially in multi-processor systems, since repeated bus access requests from a high-priority unit can prevent lower priority units from ever obtaining access to the bus. Such a condition is called starvation, and can occur in any scheme which involves a strictly static priority assignment. Static priority schemes are said to be nonsymmetric, since they tend to favor certain arbitrating units (ones with higher priority) and neglect others.
In an arbitration scheme called fixed time slicing (FTS) or time division multiplexing (TDM), a static priority assignment is enhanced with a method for ensuring that no unit is starved. This is accomplished by dividing the available bus time into fixed-size intervals, and then sequentially offering these intervals to each device in a round-robin fashion in order of priority. Should a selected device not elect to use its time slice, the time slice remains unused by any device. This scheme exhibits the desired property of symmetry since no arbitrating unit is given preference over any other. Although the maximum wait time for access to the bus is bounded (hence no starvation), devices in an FTS scheme suffer from a generally high average wait time regardless of bus loading.
Dynamic priority assignments eliminate the problem of starvation by allowing the assignment of priority to change during the course of system operation. Dynamic priority schemes are said to be symmetric if the algorithm which determines the periodic reassignment of priorities does not tend to give preference to one particular arbitrating unit. A least-recently-used (LRU) algorithm, for example, gives the highest reassigned priority to that unit which has not used the bus for the longest interval. Another dynamic priority assignment scheme, called a rotating daisy chain (RDC), determines reassigned priority to each unit's distance from the winner of the previous arbitration.
One additional arbitration method, called first-come-first-serve (FCFS) arbitration, is not readily categorized as either a static or a dynamic priority scheme. With FCFS arbitration, requests are honored in the order received. FCFS arbitration is symmetric since it favors no particular processor, but is difficult to implement because the order of received bus requests must be recorded. Furthermore, in high bus load systems, two requests may arrive at the arbitration controller within a sufficiently small period of time that the arrival order cannot be precisely determined, leaving FCFS arbitration potentially vulnerable to starvation problems.
Since central processing units are usually the most active components in a computer system, systems which utilize a plurality of processors have an especially critical need for efficient allocation of bus access. In many configurations, two or more processing units share access to other system resources, including I/O devices and main memory, via the common system bus. During the execution of a single machine-level CPU instruction, a processor may need to make a large number of separate memory accesses (one for fetching an opcode, one or more for fetching operand specifiers, one or more for fetching operands, one or more for storing results). Contention for bus access among multiple processors can therefore be very pronounced, and may not be adequately resolved using a simple arbitration scheme.
Bus loading and contention can be significantly reduced by providing processing units with cache memories, making copies of recently accessed portions of the shared main memory available to the processors without using the system bus. The percentage of memory accesses which are successfully serviced by the cache without requiring a main memory access is called the cache "hit rate," and this rate is determined in part by the size of the cache, and the replacement policy employed by the cache. A cache hit rate of 90% can reduce the number of bus requests for each processor by a factor of ten. In the worst case, however, system performance may still be greatly diminished if two or more processors are repeatedly in contention for use of the bus. This is especially true if the process of arbitration itself consumes a large percentage of the processors' time, which could otherwise be spent performing useful computations.
Consider, for example, the case when two processing units, CPU1 and CPU2, each have several outstanding memory requests which cannot be serviced by their respective cache memories. If CPU1 has a higher static priority level than CPU2, CPU2 could remain unserviced for an unacceptably long period of time. Furthermore, even if the arbitration scheme provides a means for preventing CPU2 from being starved, so that CPU2 could interrupt CPU1's bus access, CPU1 could be allowed to immediately rearbitrate for bus access. Since CPU1 is assumed to have higher priority than CPU2, CPU1's request for bus access would then be granted, potentially interrupting CPU2's computational flow before it has satisfied all of its outstanding requests. CPU2 must then rearbitrate for bus access in order to resume processing. In this way, the two CPUs could prevent each other from operating efficiently, and both CPUs could spend a majority of their time participating in relatively unproductive arbitration. This undesirable phenomenon is called "thrashing," and, when unchecked, can occur in both static and dynamic priority arbitration schemes.
In response to the configuration and operational considerations discussed above, techniques of optimization based on these more subtle operational requirements can be employed within a given arbitration scheme. A bus implementation may, for example, selectively enforce preferential policies which take advantage of the variability of bus requirements or of computational abilities among the units to be interfaced to it, in order to decrease the overhead associated with arbitration or otherwise increase overall system performance.
It is accordingly an object of this invention to provide more efficient and equitable allocation of bus access among a variety of units which may be interfaced to the bus, and among the various modes of data transfer (ranging from standard programmed I/O to DMA) which the bus must support.
It is a further object of this invention to provide a flexible bus implementation which recognizes the special resource requirements associated with the inclusion of multiple processors within a computer system.