1. Field of the Invention
The present invention relates to information processing technology. More particularly, the present invention relates to a system and method for system operators to collaborate and the coordination of their efforts using an action diary.
2. Description of the Related Art
One of the highest priorities of information technology (IT) organizations responsible with managing mission-critical computing environments is to ensure that problems, as well as conditions that could lead to problems, are handled in a timely and efficient manner. Events may come from a variety of sources. Examples include events that occur: (1) when a link to another computer system goes down, (2) when a router used for routing information goes down, (3) when a database is down, (4) when the system processor is maximized, or “pegged,” for an extended period, (5) when a disk is full, (6) when one or more applications that make up a critical business function (i.e., order entry) go down, (7) when a critical application program's performance degrades beyond an acceptable level, and (8) when a host computer is going down.
As used herein, a “business system” serves the needs of the organization's critical functions, such as order entry, marketing, accounts receivable, and the like. A business system may span several dissimilar types of computers and be distributed throughout many geographical locations. A business system, in turn, is typically based upon several application programs. An application program may also span several dissimilar types of computers and be distributed throughout a network of computer systems.
An application typically serves a particular function that is needed by the business system. An individual application program may, or may be, be critical to the business system depending upon the role the application program plays within the overall business system. Using networked computers, an application may span several computer systems. In an Internet commerce system, for example, an application program that is part of the company's order processing business system, may be responsible for serving web pages to users browsing the companies online catalog. This application may use several computer systems in various locations to better serve the customers and provide faster response to customer inquiries.
The application may use some computers running one type of operating system, for example a UNIX-based operating system such as IBM's AIX® operating system, while other computer systems may run another type of server operating system such as Microsoft's Windows NT® Server operating system. Individual computer systems work together to provide the processing power needed to run the business systems and application programs. These computer systems may be mainframes, mid-range systems, workstations, personal computers, or any other type of computer that includes at least one processor and can be programmed to provide processing power to the business systems and applications.
Computer systems, in turn, include individual resources that provide various functionality to the computer systems. For example, a modem is an individual resource that allows a computer system to link to another computer system through an communication network. A router is another individual resource that routes electronic messages between computer systems. Indeed, even an operating system is an individual resource to the computer system providing instructions to the computer system's one or more processors and facilitating communication between the various other individual resources that make up the computer system. Events, as described herein, may effect an entire business system, an application program, a computer system, or an individual resource depending upon the type of event that occurs.
The number and types of events that may occur vary widely from system to system based upon the system characteristics, load, and desired use of the system. A business system providing content from an Internet site may experience different events than a business system used to process a the company's payroll. However, many events between dissimilar systems overlap. For example, many computer systems experience problems when the disk space is full and many computer systems experience problems when the system's processor is pegged. The types of problems these events cause, however, will vary depending upon the types of work that the business system is expected to perform.
In the Internet site example, a pegged processor is likely to result in applications interfacing with Internet users to become stalled or unusable and transaction throughput to stall or become exceedingly slow. In the corporate payroll system, the same pegged processor may result in critical software applications that make up the payroll application stalling or becoming exceedingly slow. The causes of the pegged processor may also be different depending upon the usage of the computer. An Internet server's processor may become pegged due to receiving more requests from Internet users than can be handled. The corporate payroll system's processor may have become pegged due to multiple processor-intensive business applications running simultaneously on the system.
Computers are often linked to one another using a network, such as a local area network (LAN), wide area network (WAN), or other types of networks such as the Internet. By linking computers, one computer can use resources owned by another computer system. These resources can include files stored on nonvolatile storage devices and resources such as printers. Smaller computers used by an individual (client computers) are often linked to more powerful computers, called servers, that provide large file systems, larger processing capabilities, and resources not typically found on client computers. Servers may be larger PCs, workstations, or mainframe computer systems.
As computer technology continues to proliferate in society and organizations in particular, computer systems and networks likewise increase in complexity. Computing power in an organization is often distributed with servers residing in multiple locations. Operators utilize diagnostic tools and other system tools to remotely learn of conditions occurring in remote systems and to remotely correct those conditions using their available knowledge and resources. Operators may also be distributed with some operators residing at one location while other reside at a different location. In addition, operators in smaller organizations may be “on call” during weekends and non-business hours. In these environments, an operator may receive a page or call at home regarding a system problem. Using networking technology, the operator preferably logs on to the computer system from a PC at his or her home and resolves the problem from home rather than taking the additional time to travel to the office to handle the problem.
When handling a problem, system operators use various methods to record system problems and the corresponding solutions to those problems. One method used is to keep a computer or paper based system log. The operator writes or types the situations that occur on the computer system, what was done to correct the situation, and the result or outcome of their efforts. When a condition is discovered in the computer system, the operators review their logs for notes concerning any previous occurrences of the condition. If a previous entry is found, the operators use the recorded solution in order to attempt to resolve the condition.
Challenges with paper based logs are that they are maintained in one location. As discussed above, operators are often distributed from both each other and from the systems that they maintain. An operator that receives a call at home is unable to access a paper based log unless a copy is maintained at him home. Operators that are remote from one another are unable to view each other's notes without faxing or otherwise transmitting the paper based information.
Another challenge with computer based logs is that they often have difficulty accurately capturing all of the data relating to the problem. Operators often rely upon memory to reconstruct the problem that was encountered and solved. Relying upon memory introduces errors as certain details may be forgotten or inaccurately transcribed. In addition, while a computer based log is typically easier to access remotely, many manual steps must still be employed to capture the data and launch corrective actions. In addition, operators are often too busy to maintain the logs effectively. Outdated solutions may remain in the log causing confusion amongst newer operators that do not know that conditions have changed. Furthermore, if multiple operators are working on the same problem multiple updates to the log may occur causing further problems to the computer based log. One of the greatest challenges is tying log notes to the event, tying the event to the problem, and tying the solution to the problem. Another related challenge is capturing solution practices and identifying when a practice is outdated, solid, or when a practice is the best practice but not solid.
What is needed, therefore, is a system and method to assist operators in addressing system conditions and problems using an object based action diary.