As computers have increasingly become an integral part of almost all facets of business operation, they have also become increasingly more interconnected. The recent proliferation of the use of intranets is merely a step in the progression from LANs to WANs and beyond. As a result, the use of the global Internet to connect far-flung computer groups, servers and workstations is becoming the norm, rather than the exception.
As has always been the case with any office equipment, there is a need to monitor and maintain the operation of computer workstations. In the early days of computers, this usually involved the personal attention of MIS personnel, who would personally perform diagnostics on each individual workstation. LANs allowed for some integration of the monitoring functions, with centralized computers having the capability to collect limited performance information for computers directly hardwired to the central computer.
More recently, some functionality for the monitoring of performance and error states in interconnected workstations has been incorporated directly into the operating systems that allow for the interconnectivity. For example, the various embodiments of Windows, such as Windows for Workgroups and Windows NT, record any error messages in message logs on the individual workstations. These messages are stored as cryptic messages, such as references to more detailed error information contained within the application that generated the error in the first place. In the case of Windows applications, the information to "decode" the error messages are contained in the individual message dynamic linked libraries ("DLLs") associated with the application that initially caused the error or system event.
In the Windows NT environment, administrative tools allow for a server connected to a plurality of workstations to collect all of the error messages in a centralized error log. This allows for a more centralized monitoring of those workstations. A system administrator can only review the single central error log, rather than each individual workstation to discover if there are any problems. Unfortunately, this is the only way to centrally monitor the handful of workstations directly connected to the server. The business realities of today demand practically global reach for any large corporation that wishes to survive. For example, in the financial services industry, this often means branch offices in countries and cities separated by large distances, potentially thousands of miles. Having a system administrator at each remote location is redundant and wasteful, as most systems will operate nominally a majority of the time. For a single administrator to log in to various remote groups is also difficult and time consuming. The different time zones also make it difficult for any single or group of administrators to personally monitor, even if accomplished remotely, all of an organization's workstations from a central location.
It is, of course, known that servers and even workstations may be remotely accessed by various means. Whether through phone lines, the internet, satellites, etc., it is possible for an operator to access the files, including the error logs, on remote workstations from a central location. However, if an administrator wanted to use these methods to monitor all of an organization's workstations, it would require connecting to each one individually, retrieving the error logs and then scanning through the logs for important messages. Since everything from a major application failure to a momentary disk access problem is stored in the error logs, this task becomes a near impossibility due to the sheer volume of messages, among other problems.
Even if the administrator could directly connect, it would be difficult to immediately understand the cryptic error logs. Without knowing or accessing the specific DLL on the same machine that generated the error message, the error may be undecipherable. Today, with program updates and bug fixes a constant reality, it is difficult to track the version of a given message DLL on an given machine without accessing those message DLL files as well. Obviously, this compounds the task for the administrator.
At least one program does exist for automating some of these monitoring tasks. The Tivoli Management Environment, currently available through IBM, includes a component called Tivoli/Sentry. This component, while active at a central location, has the capability to automatically access a server at a remote location to retrieve its error log, which the server has gathered from the workstations connected to it.
Tivoli also includes limited functionality to automatically respond to certain errors at the server level. When a workstation forwards an error message to its immediate network server, that server may be preprogrammed with a response action based on certain events. Critical events are also transmitted to a central location for processing and generation of a corrective action, which is sent back to the remote server.