Recent years have seen a continued pattern of development in the computer field. In that regard, considerable effort has been directed to multi processor computing system. Such systems involve a plurality of processors or function units capable of independent operation to process separate tasks in parallel. Usually, the tasks relate to a specified job. Typically, a multi processor computing system includes a plurality of computational units, a memory, a control and at least one input-output processor.
High performance computer systems may utilize multiple processors to increase processing power. Processing workloads may be divided and distributed among the processors, thereby reducing execution time and increasing performance. For example, some computer systems are now provided with processors that include multiple processing cores, each of which may be capable of executing multiple execution threads.
Similarly, single-core and/or multi-core computer systems may be combined into multiprocessor computer systems, which are often used in computer servers. One architectural model for high performance multiple processor computer system is the cache coherent Non-Uniform Memory Access (ccNUMA) model. Under the ccNUMA model, system resources such as processors and random access memory may be segmented into groups referred to as Locality Domains, also referred to as “nodes” or “cells”. Another architectural model for high performance multiple processor computer system is the distributed memory computing model where nodes are interconnected with each other by a high performance interconnect or by Ethernet. In both models, each node may comprise one or more processor cores and physical memory. A processor core in a node may access the memory in its node, referred to as local memory, as well as memory in other nodes, referred to as remote memory.
Multi-processor computer systems may be partitioned into a number of elements also called cells or virtual machines. Each cell includes at least one, and more commonly a plurality, of processors. The various cells in a partitioned computer system may run different operating systems, if desired.
Generally in multi processor computers, tasks are scheduled by a task scheduler. A task scheduler is a device which determines the priority and order of execution of several simultaneous task requests and gives the “winning” task a signal to proceed.
The components in a multi processor system are prone to errors and/or failures. Self healing actions like dynamic processor resiliency for processor related errors and dynamic memory resiliency for memory related errors are performed by diagnostic agents running on the operating system. However when some of these self healing actions are taken, it comes to a cost of performance.
Other features of the present embodiments will be apparent from the accompanying drawings and from the detailed description that follow.