Troubleshooting a problem in a complex system comprising interconnected elements can be difficult. In computing, for example, a computer application that receives data from a data network may be operating slowly. There may be many different possible causes of such slowness, and discovering the root cause of the slowness can be difficult. Many other types of interconnected systems exist in many different fields or domains, in which it can be similarly difficult to troubleshoot a problem.
Typically an analyst, such as a system engineer, may be called upon to troubleshoot a complex system exhibiting a problem. However, the troubleshooting process becomes increasingly intractable and time consuming as the systems analyzed become more complex, especially if the sources of reported information are imperfect or limited, or the various elements of an inter-related system exist in different system levels, or have different scope, or the like.
Automated tools exist to aid the analyst in troubleshooting a complex system exhibiting symptoms that indicate a problem exists. Those tools generally use methods that filter according to similar symptoms, or correlate symptoms with known causes, or learn patterns of symptoms and correlate them with predetermined causes, or use a code book containing a set of rules for determining a root problem of symptoms. However, if a symptom experienced by a particular element has as its root cause a problem that exists on another, perhaps far removed and distantly related, element, these approaches may not be sufficient. Furthermore, the same root cause may result in many different symptoms in many different inter-related elements of the system, some of which symptoms may not have been anticipated or experienced before. It may be difficult or impossible to determine precisely, using existing practices, the root cause of one or more symptoms.
In addition, for organizational or analytical convenience, different system elements may be regarded as belonging to different “planes,” each plane representing some characteristic that the elements of that plane have in common. For example, for a computer application experiencing slowness, system elements might be divided into a network plane comprising network elements such as routers, switches, and communication links; a computing plane comprising computing elements such as servers and clusters of servers; and an application plane comprising databases, served applications such as web applications, and the like. Analyzing a system is even more difficult if inter-related elements experiencing symptoms exist in different planes of the system. What is needed is a different, more capable approach to troubleshooting problems in these types of complex systems.
Continuation passing style (CPS) programming is a style of computer programming in which an object, operation, or routine is provided with an explicit “continuation” that is invoked by the object as the next operation within a program, and to which the invoking operation passes its own results. When a routine calls a subroutine, the routine may explicitly pass to the subroutine a continuation function directing the subroutine to a next step when the subroutine finishes. The continuation may be merely a direction to return the result to the calling routine. However if, for example, the subroutine call was the last step in the calling routine before processing jumps to another routine or returns to a higher level routine, then the calling routine may pass its jump or return to the called subroutine. Processing can then continue directly from the called subroutine to the higher level routine or the jump destination, bypassing returning to the calling subroutine. In other instances, the continuation may be a fixed argument of the invoking operation, or may itself be computed or chosen by the invoking operation.
The consistent use of continuations when control of a process transfers from one routine to another, by making explicit the flow of control within the overall program, can assist the programmer both in defining and in tracking the flow of control. Continuations can be used in troubleshooting to trace symptoms to their root cause, or determine what additional information may be needed to determine a root cause among a plurality of possibilities.