As information technology continues to increase in complexity, problem management costs will escalate as the frequency of support incidents rises and the skill set requirements for human analysts become more demanding. Conventional problem management tools are designed to reduce costs by increasing the efficiency of the humans performing these support tasks. This is typically accomplished by at least partially automating the capture of trouble ticket information and by facilitating access to knowledge bases. While useful, this type of automation has reached the point of diminishing returns as it fails to address the fundamental weakness in the support model itself, its dependence on humans.
Table 1 illustrates the distribution of labor costs associated with incident resolution in the conventional, human-based support model. The data shown is provided by Motive Communications, Inc. of Austin, Tex. (www.motive.com), a major supplier of help desk software. The highest cost items are those associated with tasks that require human analysis and/or interaction (e.g. Diagnosis, Investigation, Resolution).
TABLE 1Support Tasks% Labor CostSimple and Repeated Problems (30%)Desktop Configuration (User inflicted)4%Desktop Environment (Software malfunction)9%Networking and Connectivity7%How To (questions)10%Complex & Dynamic Problems (70%)Triage (Identify user and support entitlement)7%Diagnosis (Analyze state of machine)11%Investigation (Find the source of the problem)35%Resolution and Repair (Walk user through the repair)18%
Conventional software solutions for automated problem management endeavor to decrease these costs and add value across a wide range of service levels. Forrester Research, Inc. of Cambridge, Mass. (www.forrester.com) provides a useful characterization of these service levels. Forrester Research divides conventional automated computer support solutions into five service levels, including: (1) Mass-Healing—solving incidents before they occur; (2) Self-Healing—solving incidents when they occur; (3) Self-Service—solving incidents before a user calls; (4) Assisted Service—solving incidents when a user calls; and (5) Desk-side Visit—solving incidents when all else fails. According to Forrester, the cost per incident using a conventional self-healing service is less than one dollar. However, the cost quickly escalates, reaching more than three hundred dollars per incident if a desk-side visit is eventually required.
The objective of Mass Healing is to solve incidents before they occur. In conventional systems, this objective is achieved by making all PC configurations the same, or at a minimum, ensuring that a problem found on one PC cannot be replicated on any other PCs. Conventional products typically associated with this service level consist of software distribution tools and configuration management tools. Security products such as anti-virus scanners, intrusion detection systems, and data integrity checkers are also considered part of this level since they focus on preventing incidents from occurring.
The conventional products that attempt to address this service level operate by constraining the managed population to a small number of known good configurations and by detecting and eliminating a relatively small number of known bad configurations (e.g. virus signatures). The problem with this approach is that it assumes that: (1) all good and bad configurations can be known ahead of time; and (2) once they are known that they remain relatively stable. As the complexity of computer and networking systems increases, the stability of any particular node in the network tends to decrease. Both the hardware and software on any particular node is likely to change frequently. For example, many software products are capable of automatically updating themselves using software patches accessed over an internal network or the Internet. Since there are an infinite number of good and bad configurations and since they change constantly, these conventional self-healing products can never be more than partially effective.
Further, virus authors continue to develop more and more clever viruses. Conventional virus detection and eradication software depends on the ability to identify a known pattern to detect and eradicate a virus. However, as the number and complexity of viruses increases, the resources required to maintain a database of known viruses and fixes for those viruses combined with the resources required to distribute the fixes to the population of nodes on a network becomes overwhelming. In addition, a conventional PC utilizing a Microsoft Windows operating system includes over 7,000 system files and over 100,000 registry keys all of which are multi-valued. Accordingly, for all practical purposes, an infinite number of good states and an infinite number of bad states may exist, making the task of identifying the bad states more complicated.
The objective of the Self-Healing level is to sense and automatically correct problems before they result in a call to the help desk, ideally before the user is even aware that a problem exists. Conventional Self-Healing tools and utilities have existed since the late 80s when Peter Norton introduced a suite of PC diagnostics and repair tools (www.Syrnantec.com). These tools also include tools that allow a user to restore a PC to a restore point set prior to installation of a new product. However, none of the conventional tools work well under real world conditions.
One fundamental problem of these conventional tools is the difficulty in creating a reference model with sufficient scope, granularity, and flexibility to allow “normal” to be reliably distinguished from “abnormal”. Compounding the problem is the fact that the definition of “normal” must constantly change as new software updates and applications are deployed. This is a formidable technical challenge and one that has yet to be conquered by any of the conventional tools.
The objective of the Self-Service level is to reduce the volume of help desk calls by providing a collection of automated tools and knowledge bases that enable end users to help themselves. Conventional Self-Service products consist of “how to” knowledge bases and collections of software solutions that automate low risk, repetitive support functions such as resetting forgotten passwords. These conventional solutions have a significant downside in that they increase the likelihood of self-inflicted damage. For this reason they are limited to specific types of problems and applications.
The objective of the Assisted Service level is to enhance human efficiency by providing an automated infrastructure for managing a service request and by providing capabilities to remotely control a personal computer and to interact with end users. Conventional Assisted Service products include help desk software, online reference materials, and remote control software.
While the products at this service level are perhaps the most mature of the conventional products and solutions described herein, they still fail to fully meet the requirements of users and organizations. Specifically, the ability of these products to automatically diagnose problems is severely limited both in terms of the types of problems that can be correctly identified as well as the accuracy of the diagnosis (often multiple choice).
A Desk-Side Visit becomes necessary when all else fails. This service level includes any “hands-on” activities that may be necessary to restore a computer that cannot be diagnosed/repaired remotely. It also includes tracking and managing these activities to ensure timely resolution. Of all the service levels, this level is most likely to require significant time from highly trained, and therefore expensive, human resources.
Conventional products at this level consist of specialized diagnostic tools and software products that track and resolve customer problems over time and potentially across multiple customer service representatives.
Thus, what is needed is a paradigm shift, which is necessary to significantly reduce support costs. This shift will be characterized by the emergence of a new support model in which machines will serve as the primary agents for making decisions and initiating actions.