Higher fault-tolerance is becoming a more important subject in hardware design. Naturally safety-critical systems, which are systems where no errors and performance losses can be tolerated, and resilient systems, which are systems, where no errors is highly wanted and performance loss is acceptable as long as the application can be saved, are systems that require the underlying hardware platform to have higher fault-tolerance, but higher fault-tolerance are also becoming increasingly important in standard applications. It is becoming increasingly harder to fabricate semiconductor devices due to the decreasing transistor size. As transistor size decrease each atom which is placed in a wrong place constitutes more to the total transistor, thus making the transistor deformed or in the best case, slightly larger than it were supposed to be, therefore giving it slightly different electrical properties—thus introducing flaws to the device. This means that it is difficult to guarantee that a given device will work in a certain way. Therefore the technology is greatly in need of ways to monitor a semiconductor device and correct faults as they occur.
The article by Mange et al.: Towards Robust Integrated Circuits—The Embryonic Approach; Proceedings of the IEEE, IEEE, New York, US, vol. 88, no. 4, 1 Apr. 2000 discloses an approach towards development of very large-scale integrated circuits capable of self-repair and self-replication. Self-repair is defined as partial reconstruction in case of minor fault, and self-replication is defined as complete reconstruction of the original device in case of a major fault. The document applies a four-level hierarchy of embryonics: population level, organismic level, cellular level and molecular level.
U.S. Pat. No. 5,931,959 describes a fault-tolerant multiprocessor system for providing hardware based flexible fault tolerance. The system comprises a plurality of computing modules, connected to a plurality of result memories, connected through a memory interface to a reconfigurable logic and switching device. The reconfigurable logic and switching device can be programmed to both supply traditional fault tolerance functions, requiring bit-for-bit exactness and more sophisticated fault tolerance functions using more complicated algorithms, that checks whether outputs meets user-defined reasonableness criteria.
U.S. Pat. No. 6,874,108 describes a method of fault tolerant operation of a field programmable gate array. The method works by identifying a faulty resource in a signal path, and reconfiguring said signal path to exclude said faulty resource, and estimating a propagation delay caused by reconfiguring said signal path, and adjusting the system clock if the estimated propagation delay is greater than a critical path propagation delay.
U.S. Pat. No. 7,142,953 describes a digital processing system capable of being remotely configured for use in space vehicles. The digital processing system comprises a field programmable gate array for performing the processing tasks, a receiver on said space vehicles for receiving commands and a field programmable gate array configuring unit, coupled to said receiver for reconfiguring said field programmable gate array, accordingly to said commands, thereby enabling the remote configurations of said field programmable gate array.
U.S. Pat. No. 7,343,579 describes a reconfigurable adaptive computing system capable of dynamically changing the way input signals are processed during runtime. The computing system comprises a detector for generating an environmental signal based on a detected environmental condition, a controller for receiving the environmental signal and selecting a processing configuration from a plurality of processing configurations, based on that signal.
Currently the problem is sought to be solved by introducing static redundancy to the system. Static in the sense that the redundancy added is simply an extra copy of the functionality already implemented. The multiple copies of the system all perform the same function and a comparator or voter compares the output of the copies. The voter makes sure that the output which the majority of the copies output, is the one which is selected as the valid output. This is a stable solution, but it has its downsides. These are for instance:                Performance degradation: The comparator or voter circuit introduces extra delay to the system;        Increasing power and area overhead: Having x copies of the same device will of course multiply the extra area and power needed by x;        Comparator or voter failures: If the fault checking circuit fails there should be a way to handle this. And it should be possible to check this.        
Furthermore, the solution described above primarily applies to Application Specific Integrated Circuits (ASICs), because reconfigurable hardware platforms like Field Programmable Gate Arrays (FPGAs) already introduces extra overhead because of their reconfigurability. Of course, the same techniques could be adopted to reconfigurable hardware platforms, but it would even further increase the overhead in terms of area, power and speed.
It remains a problem to improve reconfigurable, fault-tolerant systems.