Given the continually increased reliance on computers in contemporary society, computer technology has had to advance on many fronts to keep up with increased demand. One particular subject of significant research and development efforts is parallelism, i.e., the performance of multiple tasks in parallel.
A number of computer software and hardware technologies have been developed to facilitate increased parallel processing. From a hardware standpoint, computers increasingly rely on multiple microprocessors to provide increased workload capacity. Furthermore, some microprocessors have been developed that support the ability to execute multiple threads in parallel, effectively providing many of the same performance gains attainable through the use of multiple microprocessors. In some instances, the provision of parallel components also assists in improving system reliability from the standpoint that parallel components may provide redundancy to enable one such component to take over for another component that has failed. From a software standpoint, multithreaded operating systems and kernels have been developed, which permit computer programs to concurrently execute in multiple threads so that multiple tasks can essentially be performed at the same time.
In addition, some computers implement the concept of logical partitioning, where a single physical computer is permitted to operate essentially like multiple and independent “virtual” computers (referred to as logical partitions), with the various resources in the physical computer (e.g., processors, memory, input/output devices) allocated among the various logical partitions. Each logical partition executes a separate operating system, and from the perspective of users and of the software applications executing on the logical partition, operates as a fully independent computer.
While parallelism effectively increases system performance by virtue of the ability to perform multiple tasks at once, one side effect of parallelism is increased system complexity due to the need to synchronize the operation of multiple concurrent processes or threads, particularly with regard to data structures and other system resources that are capable of being accessed by multiple processes or threads. Separate processes or threads that are capable of accessing specific shared data structures are typically not aware of the activities of other threads or processes. As such, a risk exists that one thread might access a specific data structure in an unexpected manner relative to another thread, creating indeterminate results and potential system errors.
One particular application where synchronization may be required is in the area of enabling multiple processors in a computer to access and control a hardware device that is only capable of being accessed by one processor at a time. Particularly when the processors and/or the operating systems running thereon are multithreaded, such that multiple applications running on the processors are capable of accessing a hardware device at the same time, switching control over a hardware device from one processor to another, and in particular, determining when all of the applications on a processor are no longer accessing a hardware device, can be problematic.
In some conventional designs, counters have been used to moderate the control of hardware devices between a plurality of processors, when the possibility exists that multiple applications on a particular processor may be able to use a hardware device at the same time. With a counter, each time a device is to be used, the count is incremented, and when the device is released, the count is decremented. A common problem associated with using a counter, however, is the possibility of encountering non-zero run time errors, e.g., when an application fails or locks up and prevents the counter from being decremented. When this occurs, a processor permanently locks the hardware until recovery measures are taken.
In other conventional designs, device drivers have been used to synchronize access to their associated hardware devices, or alternatively, separate management software has been used to manage multiple hardware devices. In some instances, however, certain hardware devices and/or the hardware that is used to interface with such hardware device may not be represented by a device driver on a processor. Furthermore, the hardware may not support a locking capability. In addition, when using counters, as well as when using device drivers or management software, both shared and exclusive access are typically not supported.
One particular design for which the aforementioned access control techniques are insufficient is in a multi-user computer design that utilizes redundant service processors to aid in the initialization and management of the computer, e.g., to perform a boot up process and to perform a number of monitoring, recovery, reporting and other tasks in connection with Reliability, Availability and Serviceability (RAS). When multiple service processors are used, a shared communication path, e.g., a multiplexer, may be provided between the service processors and the system processors, memory and various other peripheral devices comprising the Central Electronics Complex (CEC) of the computer. In such a scenario, only one service processor is able to communicate through the multiplexer at a time, so some mechanism is required to control which service processor is permitted to use the multiplexer at any point in time.
Control of the usage of the multiplexer by a service processor is complicated by the fact that multiple shared hardware devices may be accessible to a service processor via the multiplexer, and furthermore, multiple applications running on the service processor may be accessing different shared hardware devices at the same time. Typically, shared hardware devices are represented in a service processor using device drivers; however, in many designs, the multiplexer itself is not known to the operating system of the service processor as a device and therefore is not represented by a device driver. Consequently, a problem exists as to how to control use of the multiplexer and to prevent the multiplexer from being given up when in use.
Counters, as discussed above, are not optimal due to the inability to support exclusive and shared access, as well as due to the possibility of one application hanging and failing to decrement a counter. Device drivers and/or management software are also not optimal, again in part due to the inability to support exclusive and shared access. Furthermore, for device drivers, each require custom programming, and when multiple hardware devices are connected to a service processor via a multiplexer, no device driver is available to manage access to the multiplexer itself From the standpoint of management software, service processors typically run an embedded operating system and have extremely limited onboard memory, so the inclusion of management software would restrict the available nonvolatile and runtime memory available for other applications needed by the service processor.
Therefore, a significant need continues to exist for an improved manner of managing access to a shared hardware device by multiple processors.