A “help desk” is well known today, and typically comprises a skilled person with a telephone to receive calls from customers regarding computer related problems, and a (computer) workstation to assist in resolving the problems. The workstation typically includes a web browser or other portal program to access the applications for which the customers often have problems. Both the help desk workstation and the customer may access the applications from remote servers. The help desk workstation is also typically equipped with a program tool to access a problem ticketing system. The help desk personnel may be structured hierarchically, with the lowest skilled (“first level”) support people sitting at the help desk workstation to interface directly with the customers, and higher skilled (“second level”) support people available elsewhere to interface to the first level support people and answer difficult questions that the first level support people cannot. Typically, customers are allowed to call (telephone) a help desk when they are having a problem with an application running on their own computer or on the remote server. The first level support person at the help desk then attempts to troubleshoot the problem based on personal experience/knowledge and application support documentation, with or without assistance from the second level support people. Sometimes, even the second level support people cannot solve the problem. In such a case, the first or second level support person can call a technical expert for the application or system in question. This technical expert is typically a computer scientist or programmer responsible for developing and/or maintaining the application or system.
Large scale server operations, such as a large web hosting system or large data processing system, typically require extensive help desk and technical support. Ideally, the first level support person can quickly solve the customer's problem without involving either the second level support person or the technical expert. In such a case, the help desk will incur the cost of only one person. Also, the first level support person is generally lower paid than the second level support person. While the involvement of the second level support person adds to the cost, this is still preferable to involving the technical expert. Generally, the technical expert is much higher paid than even the second level support person and has other duties. So, it is preferable to minimize the role of the technical expert in help desk support. Also, solving problems without involving second level support people or technical experts expedites resolution of the problem.
The application support documentation is key to effective help desk support. The (known) application support documentation typically comprises names of servers that run the application, data flows and protocols, technical contacts, URLs, file system directories used by the application, and test login procedure. During server and application development and deployment, the development and deployment personnel often neglect to document critical aspects of the architecture and implementation for the application support documentation. Also, developers and steady state support personnel often modify the application and servers over time to include new backend databases, new connectivity and new uses, and fail to update the application support documentation. This causes additional deficiencies and inaccuracies in the application support documentation. Consequently, many applications and servers lack documentation to guide the help desk people to perform test procedures required to troubleshoot and correct the customers' problems. Such deficiencies and inaccuracies in application support documentation may prolong outages and cause excessive numbers of calls from the first level support person to the second level support people and technical experts. Also, the lack of application support documentation compounds the effort and cost required to solve a customer's problem.
Configuration information needed by the help desk personnel is often stored within a server. However, the help desk personnel may be prevented from accessing or understanding configuration information in a server. For example, access to configuration files within the server may be limited to people with “super user”, “root” or “administrator” privilege level. This is because the configuration affects overall operation of the applications, and may include user IDs and passwords. The help desk people typically lack such a high privilege level. Even when a help desk person has permission to (remotely) access configuration information within a server, the help desk person may not understand the format of the information because it may be designed for an application to read. Also, the helpdesk person may cause additional damage if the help desk person is not properly trained or follows an improper procedure.
Oftentimes, an application (such as a web site application) will be “down” because a single backend database used by the application does not respond. The backend database itself may be down or the communication link between the application and the backend database may be down. Because help desk people (first and second level) may not be familiar with the server architecture, including the backend databases, they may erroneously think the application or the server on which the application is running is malfunctioning whereas the problem is actually with the backend database or its connection to the application. In such a case, the help desk people will not be able to correct the problem, and may even call a developer of the application or the server on which it is running for technical support. Typically, the developer of the application or server on which it is running will not know the identity or state of the backend databases, and cannot solve the problem. In such a case, the time and effort of first and second level support people and one or more technical experts will be wasted. Without identifying the backend database as the problem, the support people may not even know the proper systems administrator to call to trouble shoot the problem.
The hardware, software or network components used in supporting an application may themselves incorporate the capability of providing status and diagnostic information in response to an inquiry or test. The simplest such inquiry is the common TCP/IP network “ping”, “traceroute” or “netstat” inquiry, however existing hardware, software, and network components also provide more detailed status and diagnostic information than is available via “ping”, “traceroute” and “netstat”. For example, a DCE/DFS file system includes the ability to query the ability of file servers to serve files, all the way through the software stack. The ability to initiate such tests or queries for status data is a significant aid in determining the source of a problem. A skilled operator entering a command or set of commands on the server being queried typically initiates inquiries for this data or a remote server connected via a network. Multiple steps, including login/authentication, setting environment or software debug parameters, command initiation, and output parsing, may be required in order to obtain the results of a status inquiry. The operator then interprets the results and determines whether the queried component is functioning within normal parameters. A help desk operator may be unaware of the availability of this diagnostic data, unable to properly initiate the queries required to access it, or unable to properly interpret the results of such a query. In such a case, the help desk operator may not be able to determine if, or where, a problem exists and will need to call in technical experts to properly initiate status queries and interpret the results of such queries.
US Patent Publication US 2003/0149919 A1 discloses systems and methods for diagnosing faults in computer networks. A topology mapper provides a network topology including the location of key services (such as e-mail, DNS, web server). The system uses the network topology to dynamically generate a thorough traceroute using a path-tracing algorithm. A fault diagnosis engine diagnoses a fault related to the communications network. The network management system also includes a help desk.
U.S. Pat. No. 6,353,446 discloses a program to assist a service person in managing an enterprise network. Network visibility software provides access to information from both local and remote client/server networks, providing a central control point from which to manage traffic on distributed networks. Network visibility software assists in explaining possible causes for network problems, collects expert analysis data automatically based on user-specified time intervals and data parameters, learns network configurations continuously, shows breakdown of network protocol activity automatically, and displays network errors. Nevertheless, further improvements are desired to assist help desk people troubleshoot and correct problems with applications and servers.
Other systems for mapping network topology were known that proactively map how networks are connected and where IP addresses reside within networks. These systems test for network connectivity of servers and other computer devices, and generate lists of what server or other computer device is connected to what network at the TCP/IP layer of the network stack. For example, the mapping will indicate on which subnet each server or other computer device resides. Other systems gather network statistics from a network interface to determine if there are any TCP/IP connections.
With the current systems, much of the information is gained from monitoring systems that track network activities and issue alerts when an event triggers a particular situation or condition. The system operator or user (help desk person) is made aware of the alert. However, the current systems do not automated and do not provide details about the nature of the alert or system location where the event occurred that triggered the alert.
There remains a need for a method and system that will generate and display information about system activities for the purpose of troubleshooting system problems. The method and system should be able to provide information about the system location of a problem and the computing equipment related to the problem.