The present invention relates to a system for initializing a distributed computer system and to a method for initializing a distributed computer system.
There is a growing demand for high performance computer systems. Many modern computer systems have a distributed architecture, so that a plurality of computers are coupled to each other via a common communication media. (See: U.S. Pat. No. 5,887,143 of Saito et al and the references mentioned therein).
A premium is placed on the reliability of computer system, especially when the computer system handles safety critical applications such as xe2x80x9cbrake by wirexe2x80x9d applications in vehicles.
The reliability of such a computer system can be enhanced by using fail silent computer nodes, and by synchronizing the computer nodes that share the common communication media. (See: U.S. Pat. No. 4,866,606 of Kopetz, and U.S. Pat. No. 5,694,542 of Kopetz). A fail silent computer node either produce the correct result or does not produce any result at all. Building fail silent computer nodes is well known in the art (see: U.S. Pat. No. 5,694,542 of Kopetz, and the references mentioned therein).
A premium is also placed on the availability of each of the fail silent computer nodes. Thus, there is a need to initialize as many fail silent computer nodes as possible, in a fast manner, even in noisy environments. Furthermore, there is a need to initialize, in a fast manner, fail silent computer nodes either when the computer system is started up and when the system is already working and there is a need to start up a portion of the computer system that has been shut down.
In known circuits it is difficult to reconcile these contrasting requirements in a optimum way. For example, U.S Pat. No. 5,694,542 of Kopetz describes a computer system comprising of a plurality of fail silent units, each fail silent unit has a plurality of fail silent computer nodes. The fail silent units are coupled to two parallel buses. The fail silent computer nodes are initialized upon reception of an initialization word (I-message). After a fail silent computer node is synchronized it sends data frames that differ from the I-message (i.e.xe2x80x94N-message).
A startup timeout parameter is stored within each of the fail silent computer nodes. This parameter determines a period starting from the power up of the computer system and ending during the transmission of the I-message. Thus, after power up a computer node waits until the start up time has elapsed and then sends the I-message. A disadvantage of this solution is that the fail silent computer node that sends the I-message (i.e.xe2x80x94the sender computer) does not check whether either other I-messages or N-messages are simultaneously being sent via the communication media. Thus, a collision of data frames can occur. A further disadvantage of the prior solution is that a the startup timeout parameter is not adapted to various scenarios of initializations, such as noisy communication media, initialization in the presence of other I-messages or even N-messages.
The reliability of such a system can be enhanced by sending an I-message, only if the communication media is silent. Thus, collisions are avoided. A disadvantage of this solution is that in a noisy environment the computer system will not be initialized.
Accordingly, there is a need for improved system and method for initializing a distributed computer system, and for providing a computer system which is both reliable and available.