1. Field of the Invention
The field of the invention is methods and systems for providing independent clock failover for scalable blade servers, and blade server chassis having independent clock failover capability.
2. Description of Related Art
In today's scalable computer systems such as, for example, the IBM® eServer® xSeries®, the clock domains between server blades are typically asynchronous. Asynchronous clock domains often create a problem for software running on multiple server blades because timestamp information derived from clock signals in different clock domains is often inconsistent. Consider the following two examples that explain the problems created by asynchronous clock domains.
The first example includes a scenario in which a task of a computer software application migrates between two server blades for execution. While the task executes on one server blade, the task may read the time counter. After migrating to the other server blade for execution, task may read the time counter again. Based on the elapsed time between reading the time counter while executing on the first server blade and reading the time counter while executing on the second server blade, the task may perform some action. Because the server blades may be operating in asynchronous clock domains, the task's second reading of the time counter may appear to the task to have occurred before the task's first reading of the time counter. The task, therefore, may generate an error. Such errors may be costly to debug and waste computer resources.
The second example includes a scenario in which an application log keeps track of the events of a software application executing on server blades in different clock domains. Because the local clock signals of each server blade may operate at slightly different frequencies, the system clocks that track the time of day for each server blade will not remain synchronized. Logging events using such unsynchronized system clocks has the potential to cause confusion because events may not be ordered correctly in the application log.
One current solution to these two problems above is for software developers to obtain the time from a system call to an operating system application programming interface (‘API’) that utilizes a global clock signal fanned out to all the server blades instead of using the local counter provided by the server blade's processor. Such a global clock signal allows the server blades to operate in the same clock domain. The drawback to this solution is that all software applications must be recoded to utilize the system call. A further disadvantage is that the system call to an operating system API may have a lower resolution than the local performance counter.
Another solution to the problems described above places all the server blades of a blade server chassis into a single clock domain governed by one global clock signal. In the event that a clock input fails for one of the server blades, a redundant global clock signal would provide a second clock domain into which all the server blades of the chassis could be switched. The current solution, therefore, allows all of the server blades to remain synchronized, but the current solution forces all of the server blades of a chassis to switch clock domains regardless of which server blade experienced the clock failure. Because all the server blades typically do not execute the same software application, this current solution often forces some server blade to switch clock domains unnecessarily.