1. Field of the Invention
This invention relates to a distributed computing system, and more particularly to a host computer ("host") which can be accessed from a terminal not physically connected to the host and located at a site remote from the host. The remote terminal can access a stored sequence of video screens, such as a sequence of video screens occurring during host reset or failure operations. The sequence of video screens can then be replayed by a computer administrator located at the remote terminal. Remote access to those video screens allows the administrator to determine how a host operating system is responding to a reset, or possible reasons why the host system failed. Provided with the host is a printed circuit board ("PCB") which can be inserted into a backplane which accommodates the host. The PCB comprises a processor and memory for storing the sequence of video screens even when power is lost to the host.
2. Background of the Relevant Art
Distributed computing systems are generally well known. Such systems allow communications between application programs hosted on numerous computer workstations. There are numerous types of distributed computing systems, often classified by the geographical extent of their communication capability. Terms used to classify the geographical breadth of distributed computing systems are, for example: local area networks (LANs), metropolitan area networks (MANs), and wide area networks (WANs).
Many of the more popular distributed computer systems employ a file server ("server"). Files, or data, are managed by a host within the server. Servers are particularly beneficial in allowing workstations fast access to files stored by the server. Accordingly, file servers embod a host computer which responds to an operating system program (a popular operating system being, e.g., Windows NT.RTM.) to not only orchestrate the files, but also to maintain file security, file backup, etc.
An important aspect of maintaining host functions within a server is to manage the host from a site remote from the host and, more specifically, to manage the host at a site remote from the workstations physically linked to the server. Recent trends have seen a steady increase in the number of servers used in business. Nowadays, servers are liberally used possibly at each location of a business entity--rather than employing a centralized mainframe at one location. Unfortunately, funds available to administer many server hosts located at disparate locations is decreasing. While data placed on these servers is considered critical to the business, there remains insufficient means for ensuring their proper operation from a single service site. An expectation that an administrator travel to remote server sites to fix a problem is not only impractical but quite costly given the expense associated with server downtime.
Many operating systems, or applications associated with those operating systems, allow access to the host from a remote site often called a "virtual terminal". A virtual terminal, while not physically connected to the host, nonetheless allows remote control of certain operations of the host. Products such as COMPAQ Server Manager/R.RTM. ("SMR") and COMPAQ Insight Manager.RTM. ("CIM"), obtainable from Compaq Computer Corp., have attempted to address some of the issues involved in managing a network of distributed servers from a single, remote site. These products allow, inter alia, an administrator to be alerted as to a remote server failure, to reset the server from the remote site, and to access certain information provided on the server console.
It is certainly beneficial to allow remote control of certain server functions, especially those needed to reset one or more servers within a network of servers. Any downtime caused by server failure is probably the most costly time involved in running a distributed computer system. The causes of server failure, often termed server host "crash" are numerous. Any number of malfunctions or design flaws associated with the server hardware, server operating system or application programs running on the server may account for a server crash. If a server crashes, then file access is often lost and business records are temporarily inaccessible until the cause of failure is fixed.
A true benefit would result if an administrator located remote from the server can do more than be alerted to, and then reset, a failed server. In particular, it would be advantageous for the administrator to determine the cause of server failure so that he/she can possibly prevent future failures before they occur. Prevention of failure is as important, if not more important, than resetting a server that has crashed.
The cause of a failure is generally displayed on the server console at the time in which the server crashes. Moreover, irregularities in the server host hardware or operating system software can be detected upon reset (or "boot"). Those irregularities can lead to future failure if not attended to by the administrator. Accordingly, it would be beneficial to gain access to what is displayed on the server host console not only during server reset (or failure) but also leading up to server reset/failure. Information within the video screens (more particularly the sequence of video screens) displayed on the server console, which occur during server failure or reset would help remotely located administrators determine (and hopefully fix) an existing server failure or potential failure.
The video screens, resulting from a reset or failure of the server, comprise a sequence of video screen changes displayed on the host server console by the operating system, system basic input output system ("BIOS"), server application program or other system software. In particular, capture of two screen change sequences are of particular interest to a server administrator. In order to fix an existing failure or a future failure, it would be beneficial that the administrator be given the sequence of screen changes prior to server failure as well as the sequence of screen changes following a reset. Examples of server failure screens displayed on the server console are Microsoft Corp., Windows NT.RTM. "blue screens" and Novell Corp., NETWARE.RTM. ABEND message which appear on the server console when the respective operating system crashes. These screens provide information such as processor fault indicia, system software routine addresses, and pertinent system memory contents. Upon reset of the server, the power on self test ("POST") code, associated with the aforementioned operating systems, typically performs some system diagnostics and displays information regarding failures detected to the server console screen. Hence, a means for capturing such sequences and replaying them at a remote management site is desired.
In addition to hardware and software problems, a server can also fail if power to the server is halted. Unfortunately, if power is halted, any screen changes as to what occurred prior to failure will be lost. A server is therefore needed which employs a mechanism for saving reset and failure screen changes even when power to the server is lost. The stored screen changes may then be beneficially read at a future date by a remotely situated administrator. The desired mechanism is one which can therefore maintain screen information during power loss, and can selectively forward power only to critical units within the mechanism. Accordingly, a mechanism is needed which is preferably embodied upon a PCB mountable within a server chassis. The PCB is desirably connected to the server host and includes media for storing screen information output from the host, and for maintaining that information even when server power is discontinued.
Communication between a remote site and a server is typically performed via text-based connection protocols, generally known in the industry as American National Standards Institute ("ANSI") terminal emulation protocols. Although terminal emulation protocols provide a certain level of functionality, it is desirable that other protocols, in particular protocols which enable application layer protocols such as simple network management protocol ("SNMP"), a protocol for communication of server management information, be supported on a point-to-point ("PPP") communications link between the server and the remote site. If a server is to include a PCB embodying a system for communicating with the remote site using a plurality of communications protocols, then it is desirable that sub-systems upon the PCB determine which of the supported protocols (i.e., text-based, PPP, etc.) the remote site is using as it communicates with the server.