Ever increasing demand for high throughput data processing systems has caused computer designers to develop sophisticated multi-processor designs. Initially, additional processors were provided to improve the overall bandwidth of the system. While the additional processors provided some level of increased performance, it became evident that further improvements were necessary.
One way to improve system performance involves the use of partitioning. Partitioning refers to the allocation of the system's data processing resources to a number of predefined “partitions”. Each partition may operate independently from the other partitions in the system. Using partitioning, a number of parallel tasks may be executed independently within the system. For example, a first portion of the system resources may be allocated to a first partition to execute a first task while a second portion of the system resources may be allocated to a second partition to execute a second task.
System resources may be allocated to partitions by a system controller. The controller allocates resources to the various partitions based on the task being performed by each partition. For example, a large task may require more system resources than a small task. A system controller may therefore add resources to the partition of the system servicing the large task, and may delete resources from a partition servicing a smaller task, thereby increasing the efficiency of the overall system.
In many cases involving the allocation of resources to a partition, one or more of the other units already residing within the partition must be stopped before the allocation operation can be completed. This type of partitioning, known as “static partitioning”, involves halting normal processing activities and, in some cases, stopping one or more system clocks, before a unit may be added or removed from a partition. This is necessary to ensure that when the partition begins executing with the additional resources, all resources are in a consistent, known state.
In more recent years, strides have been made to allow “dynamic partitioning” operations to occur for some system configurations. Dynamic partitioning operations allow resources to be allocated to, or de-allocated from, a partition without requiring that all processing activities occurring within that partition be stopped. Dynamic partitioning activities may be performed while the system clocks are running. Dynamic partitioning is more efficient, since processing is allowed to continue while the partitioning operation is occurring.
A major step in dynamic resource allocation was to provide input/output subchannels with the capability of dynamic allocation as taught in U.S. Pat. No. 4,437,157, issued to Witalka et al. Logical file designations for peripheral devices is suggested by U.S. Pat. No. 5,014,197, issued to Wolf. Similarly, U.S. Pat. No. 4,979,107, issued to Advani et al., suggests logical assignment of peripheral subsystem operating parameters.
The capability to reconfigure has been used in a number of systems applications, U.S. Pat. No. 4,070,704, issued to Calle et al., provides a boot strap program with the capability to change the initial load peripheral device upon determination of a failure in the primary loading channel. Perhaps the most often stated purpose for reconfiguration is to provide some degree of fault tolerance. U.S. Pat. No. 4,891,810, issued to de Corlieu et al., and U.S. Pat. No. 4,868,818, issued to Madan et al., suggest system reconfiguration for that reason. A related but not identical purpose is found in U.S. Pat. No. 4,888,771, issued to Benignus et al., which reconfigures for testing and maintenance.
The capability to reconfigure a data processing system can support centralized system control as found in U.S. Pat. No. 4,995,035, issued to Cole, et al. A current approach involves the assignment of logical names for resources as found in U.S. Pat. No. 4,245,306, issued to Besemer et al. and U.S. Pat. No. 5,125,081, issued to Chiba. An extension of the capability to identify resources by logical names is a virtual system in which the user need not be concerned with physical device limitations, such as suggested in U.S. Pat. No. 5,113,522, issued to Dinwiddie, Jr. et al.
Although some strides have been made in the ability to dynamically partition units in some system configurations, other system configurations have not readily permitted dynamic partitioning activities. For example, in data processing systems in which multiple units such as processors share a common resource such as a bus, it has been difficult to allow units to dynamically enter into, or be removed from, a running partition. The difficulties are largely related to the fact that arbitration activities must be synchronized to prevent multiple units from inadvertently attempting to simultaneously acquire access to the shared resource. This is because prior art systems could only synchronize arbitration activities after processing was halted.
For example, one way to address the type of problems described above involves halting all requests being made to the shared resource. In one embodiment, an operating system prevents all processors from making any further requests to a shared bus. Another unit is then selected for addition to the partition. The newly-added unit and all halted units are initialized to a common state that will allow execution to be resumed in an orderly fashion. This common state will indicate which unit will first acquire access to the bus. This state may also, in some cases, determine the priority scheme that will be used to grant access to the bus.
As may be appreciated, halting bus activities to add or remove a unit to an existing partition results in lost processing throughput, since normal processing activities must be temporarily suspended. One way to address this problem involves limiting the number of units using a shared resource. For example, assume that, at most, two units are coupled to a bus. Further assume that a running partition includes one of these units that is making requests to the bus. The second unit may be added to this partition without the need to synchronize any arbitration activities, since the second, newly added unit is guaranteed the right to obtain access to the bus after the other unit relinquishes control over that resource. In this instance, it is unnecessary to halt the partition to add another unit.
As is evident from the example above, limiting the number of units that have access to a shared resource may allow dynamic partitioning to be completed without stopping normal processing activities. However, this solution is not acceptable for larger-scale systems that allow more than two units to share a resource such as a bus. What is needed, therefore, is an improved system and method for performing dynamic partitioning activities that address the foregoing problems and challenges.