Large-scale computer systems often host multiple applications that support many users. Example applications include database applications, file servers, and software services. Some large-scale computer systems are controlled and managed by operators. The users are those who rely on the services provided by a system, and operators are those who are those responsible for keeping the systems operational, providing first level support for application, database, and network exceptions or requests, and escalating any issues to the appropriate support personnel.
Many products have been developed to assist operators in performing their designated tasks. For example, the Single Point Operations (SPO) product from Unisys is a LAN-based arrangement, including applications running on workstations connected to the LAN, that supports various operations scenarios. For example, the SPO arrangement supports operations of multiple systems by a single operator at one workstation, operations of a single system by multiple operators, and various alarm and automation functions.
The SPO product includes applications, for example alarm and status applications, that are driven by event reports, which are generated by various automation components of the SPO product. The event reports are forwarded to the applications by another SPO component called the SPO server. Large data centers use multiple SPO servers to distribute the load of processing event reports and as a means of providing a fail-over SPO server should another SPO server fail.
The manner in which event reports are processed by a SPO server impacts operational flexibility. A SPO server transmits event reports only to SPO applications on connected workstations. This limits the ability to configure certain fail-over capabilities in a SPO environment. In a SPO environment a “secondary” SPO server can take over for a primary SPO if the primary SPO server fails. In the event that the primary SPO server fails, the secondary SPO server connects to the computer system formerly controlled by the primary SPO server. Because a SPO server only sends event reports to SPO applications, the secondary SPO server will not have available the history of event reports prior to failure of the primary SPO server. In other words, the context of the controlled system, for example, status and alarm information, is lost when the new SPO server takes over.
A method and system that addresses these and other related problems are therefore desirable.