1. Field of the Invention
This invention relates to power loss detection in multiple power domains, and more particularly to power loss detection of failing power domains by operational power domains, and to disabling interfaces between the failed power domain and operational power domains which communicate with the failed power domain.
2. Description of the Prior Art
High reliability and minimal system "down" time is vitally important in modern digital systems. One approach to increasing the reliability in a computer system is to utilize redundant "power domains", as described in copending patent application Ser. No. 08/172,661, filed on Dec. 23, 1993, and entitled "Fault Tolerant Clock Distribution System". Such a system utilizes multiple power domains, which are electrically isolated voltage planes, in order to allow circuitry residing in one power domain to remain operational upon the loss of voltage to another power domain. Each power domain can contain substantially equivalent or identical circuitry, including memory modules, so that the loss of one power domain will not result in a system failure or a loss of data. In such a multiple power domain system, communications are carried on between the power domains so that no attempt to access the circuitry and/or the memory modules in a failed power domain occurs. The present invention provides cross-monitoring of the voltage in one power domain by circuitry residing in another power domain, so that an early detection of ensuing voltage loss may be detected, and error-free operation may continue in the remaining power domain(s).
The use of redundant memories storing duplicates of the same data is referred to in U.S. Pat. No. 5,295,258, by Jewett et al., issued Mar. 15, 1994. Such a system utilizes redundant circuitry to prevent inoperability of the system upon a circuit failure. The Jewett et al. reference also utilizes redundant power supplies to increase fault tolerance. However, the Jewett et al. reference does not utilize redundant power domains as in the present invention. The redundancy in supplying voltage is limited to redundancy of power supplies, as depicted in FIG. 13 of Jewett et al. Therefore, the redundant circuitry is powered by both power supplies, rather than having redundant circuitry powered by separate, isolated voltage buses as in the present invention. In the present invention, each power domain is individually monitored by another power domain for power loss of voltage, and if one power domain fails, another will continue to be fully operational. No voting scheme is necessary with the use of redundant power domains, but rather monitoring for a faulty power domain is required, along with disabling further activity associated with the failed power domain.
Another type of recovery circuit for redundant circuitry is shown in U.S. Pat. No. 5,212,797, by Miyake et al., issued May 18, 1993. The redundant Central Processing Units (CPUs) in Miyake et al. each have an associated voltage monitoring circuit to detect when its supply voltage has dropped below a threshold voltage. Upon recognition of such a voltage drop, the corresponding CPU is placed in a standby state, or in other words, stops normal CPU operation. The present invention includes voltage monitoring which monitors for a voltage below a threshold voltage as well. However, the voltage monitoring of the present invention monitors for voltage degradation on a different power domain, so as to disable activity in the failing power domain, and normal operation of the system can continue in the operational power domain. The present invention allows the system to continue to operate without having to stop normal operation with a "standby" state.
The present invention was developed in order to overcome power loss problems associated with the use of redundant circuitry. Logic circuit or memory redundancies are not fully fault tolerant if a power loss occurs or a voltage bus short circuit occurs which can cause all of the circuitry to fail. This single point of failure is overcome through the use of redundant power domains, and utilizing redundant circuitry on the redundant power domains. The loss of one power domain will not result in a loss of circuit operation or in a loss of data, since the remaining operational power domain(s) will continue to supply voltage to the redundant circuits or memories. However, in order to effectively use redundant power domains, an ensuing power loss in a power domain must be recognized, so that no further activity is directed to the failed power domain, and no further communications between operational power domains and the failed power domain take place. The present invention provides such power domain failure recognition and recovery. Each power domain has voltage monitoring circuitry which is monitored by a different power domain's circuitry. This is necessary since the circuitry in a failing power domain cannot be relied upon to accurately monitor itself, since the circuitry will fail as the voltage supplying it degrades. When the circuitry within a power domain acknowledges that the voltage in another power domain has decreased beyond an acceptable threshold, activity begins within the operational power domain to generate a latched signal indicating such a power loss, to generate an interrupt to a processing unit to begin any recovery action, and to disable further reading and writing to memory devices in the failing power domain by logically removing those memory devices from the system configuration. The memory in the operational power domain, which stores duplicate data as that of the memory in the failed power domain, will remain operational, and no data loss occurs. At this point, the memory in the operational power domain no longer has a backup memory, since its backup memory became inoperable at the time of the loss of power in the failed power domain. The processing unit can then direct the data in the operational memory to be redirected to another memory in a power domain which still has an operational backup memory in yet another power domain. The present invention therefore allows a redundant memory system to experience only minimal periods of non-redundant activity, while allowing normal memory operation to continue.