As computing systems grow in complexity, it becomes increasingly likely that the people who design and/or build a given computing system are not the same people who are ultimately responsible for the operation and maintenance of the computing system. In this regard, “computing system” broadly refers to one or more computing devices configured to perform a plurality of computer-implemented transactions (e.g., an e-commerce web server running point of sale transactions). A result of this growing divide between design and maintenance is that when problems arise with the computing system, solutions that may be readily apparent to the system creators are otherwise not as apparent to those maintaining the computing system.
Commercially available analytics products have the ability to predict that something is going to happen (e.g., a server crash) based upon prior performance and events that have occurred in the past. For example, a manufacturer of a product may decide that 3 errors in 1,000,000 is acceptable, that at 300/1,000,000 you may want to keep an eye on things, and that at 3,000/1,000,000 one can predict that a computing system is about to experience a hard failure and associated server crash, based on what has occurred in the past.
While some analytics products are able to make some meaningful predictions along these lines, more complex problems, or problems whose solution is not as immediately apparent are problematic. For such problems, the causal relationship between a problem and unpredicted symptoms of that problem can be difficult to ascertain. That is, a detected problem in a given computing system may only be a tangential symptom of an otherwise undetected root cause, and without the expert input of the system creators, identifying the root cause can be quite challenging and time-consuming.