Some multiprocessor systems exist today that have been designed to offer increased reliability using paired microprocessor cores. An exemplary system is described by Timothy J. Slegel et al. IBM'S S/390 G5 MICROPROCESSOR DESIGN, IEEE MICRO, March 1999, which has been used to achieve industry-leading reliability. However, this prior art design is based on an approach that completely duplicates an I (Instruction) unit and E (Execution) unit of the core. That is, on every clock cycle, signals coming from these units, including instruction results, are cross-compared in a R (Reliability) unit and the L1 cache. If the signals don't match, hardware error recovery is invoked. This checking scheme solves the problems associated with traditional checking, although at an additional cost in die area.
While this design approach has offered high reliability, the duplicated resources were not available even when high reliability was not required. However, some classes of applications offer natural resilience, and it is advantageous to enable systems with higher performance when executing such algorithms. Examples of such algorithms are digital content creation and graphics processing, where deviations from the numerically correct results are not noticed by viewers; and convergence-based algorithms, wherein a corrupted numeric value may increase the runtime, but not impact final result correctness.
Thus, for example, a soft error occurring at a low-order mantissa bit may cause one or two additional iterations to be performed, but making twice the number of cores available to the application will result in an overall speedup.
A single system may be used to execute resilient programs (e.g., financial forecasting and simulation), and those requiring high accuracy (e.g., financial transactions), either simultaneously, or at different times. A single application may also consist of components requiring high reliability, and those being naturally resilient.
FIG. 1 shows a prior art multiprocessor system 10 including multiple processor cores 12a, . . . , 12n (such as embedded on a single chip or system on Chip (SoC) interfaced with system components 15 comprising, for example, memory nest, interrupt controller, etc. Each core 12a, . . . , 12n communicates with system components, e.g., by receiving respective input signals 20a, . . . , 20n, and sending output signals 25a, . . . , 25n. 
A prior art multiprocessor system described in U.S. Pat. No. 7,065,672 entitled “Apparatus and methods for fault-tolerant computing using a Switching Fabric” describes a computer system having a switching fabric that communicates transactions asynchronously between data processing elements and a target processor. While this application describes a method for determining correct execution, voting is performed between a plurality of processors, the processors are not to be independently used, and are not shown to be independently usable for lack of switching fabric access. Furthermore, this prior art configuration is dependent upon the features of asynchronous switching networks and the operation of peripheral devices.
Current fault-tolerant systems do not enable both processors to provide independent operation when computational processes are naturally resilient, nor do they enable pairwise execution and checking when they are not.
It would be highly desirable to provide a system and method that provides a pairing facility that enables selective pairing of microprocessors for high reliable (fault-tolerant) implementations under software control, and further enables the scheduling of selected cores for pairing.
It would be further highly desirable to schedule threads such that software-resilient threads execute on throughput-optimized hardware configurations, and threads requiring hardware-resilience (“hardware-resilient threads”) executed on reliability-optimized hardware configurations.