1. Field of the Invention
This invention relates in general to server reliability, and more particularly to a method, apparatus and program storage device for performing a remote power reset at a remote server through a network connection.
2. Description of Related Art
Distributed computing systems are generally well known. Such systems allow communications between application programs hosted on numerous computer workstations. There are numerous types of distributed computing systems, often classified by the geographical extent of their communication capability. Terms used to classify the geographical breadth of distributed computing systems are, for example: local area networks (LANs), metropolitan area networks (MANs), and wide area networks (WANs).
Many of the more popular distributed computer systems employ a file server (“server”). A host within the server manages files or data. Servers are particularly beneficial in allowing workstations fast access to files stored by the server. An important aspect of maintaining host functions within a server is to manage the host from a site remote from the host and, more specifically, to manage the server and/or host at a site remote from the server. Recent trends have seen a steady increase in the number of servers used in business. Nowadays, servers are liberally used possibly at each location of a business entity—rather than employing a centralized mainframe at one location. Unfortunately, funds available to administer many servers located at disparate locations are decreasing. While data placed on these servers is considered critical to the business, there remains insufficient means for ensuring their proper operation from a single service site. An expectation that an administrator travel to remote server sites to fix a problem is not only impractical but also quite costly given the expense associated with server downtime.
Many operating systems, or applications associated with those operating systems, allow access to a host from a remote site. Often this is referred to as a “virtual terminal”. A virtual terminal, while not physically connected to the host, nonetheless allows remote control of certain operations of the host. Products have attempted to address some of the issues involved in managing a network of distributed servers from a single, remote site. These products allow, inter alia, an administrator to be alerted as to a remote server failure and to access certain information provided on the server console. In a networked system, different processes may communicate with each other. For example, each process that wants to communicate with another process may identify itself to a TCP/IP protocol suite by one or more ports. Sockets using the TCP protocol are either active or passive. Active sockets initiate connections to passive sockets. By default, TCP sockets are created active. To create a passive socket, the socket is bound with the bind( ) system call, and then the listen( ) system call is used to tell the kernel to start listening for incoming connections to the IP/Port that was bound with the bind( ) system call. The accept( ) call returns control to a program when data arrives on the designated TCP port.
When a server runs out of virtual memory because of application memory consumption, the server may become frozen or hang. Theoretically, all programs that are still running and that don't require additional computer resources will continue to run. The basic functionality of some components may be guaranteed by pinning a program/process to memory. Pinning generally refers to an ability for pages to remain in main memory and not have to be swapped out, typically by a computer operating system. This enables memory pages to be maintained in real memory all the time. However, if a program/process is not pinned to memory (normally it is not), the program/process competes for memory resources with other programs. However, as soon as a new resource is required, e.g., memory, the program will fail. When the program fails, new users can't login into the affected server any more, programs can't be restarted, etc. This situation is similar to the situation when a user needs to hit “ctrl+alt+del” in the windows operating system.
Accordingly, the only way to restore a server is to reboot it by resetting the power to the server. Rebooting by resetting the power becomes a huge problem in case of remotely located servers. To reset power remotely, additional hardware is normally required or the server must have built in hardware features, such as certain models of IBM pSeries™ servers. However, in both cases installation of additional communication equipment is required.
It is certainly beneficial to allow remote control of certain server functions. Any downtime caused by server failure is probably the most costly time involved in running a distributed computer system. If a server hangs, for example, then file access is often lost and business records are temporarily inaccessible until the server is reset. A true benefit would result if an administrator located remote from the server could initiate a remote requested action at a remote server through a network connection.
It can be seen then that there is a need for a method, apparatus and program storage device for performing a remote power reset at a remote server through a network connection.