Among all programming activities, debugging still remains the most common and most costly. A recent study by the NIST found that software engineers in the U.S. spend 70-80% of their time testing and debugging, with the average error taking 17.4 hours to find and fix (Tassey, G., “The economic impacts of inadequate infrastructure for software testing.” National Institute of Standards and Technology, RTI Project Number 7007.011, 2002). Software engineers blamed inadequate testing and debugging tools.
One reason for this might be that the feature sets of commercial debugging tools have changed little in the past 30 years: programmers' only tools for finding errors are still breakpoints, code-stepping, and print statements.
Research describes debugging as an exploratory activity aimed at investigating a program's behavior, involving several distinct and interleaved activities:
Hypothesizing what runtime actions caused failure;
Observing data about a program's runtime state;
Restructuring data into different representations;
Exploring restructured runtime data;
Diagnosing what code caused faulty runtime actions; and
Repairing erroneous code to prevent such faulty runtime actions.
Current debugging tools support some of these activities, while hindering others. For example, breakpoints and code-stepping support observation of control flow, but hinder exploration and restructuring; visualization tools help restructure data, but hinder diagnosis and observation.
There have been many attempts to design more useful debugging paradigms and tools, including automatic debugging, relative debugging, program slicing, and visualizations. For example, Lencevicius et al. discuss Query-Based Debugging [Lencevicius, R., Holzle, U., and Singh, A. K., “Dynamic query-based debugging of object-oriented programs,” Journal of Automated Software Engineering, 10(1), 2003, 367-370], where programmers form textual queries on objects' runtime relationships. However, it forces programmers to guess what relationships might exist, and requires learning an unfamiliar query language. Briggs et al. discuss a task timeline [Briggs, J. S., et al., “Task time lines as a debugging tool,” ACM SIGAda Ada Letters, XVI(2), 1996, 50-69] for debugging distributed Ada programs. Their visualization highlights a dynamic slice, but it does not relate runtime actions to code. Zeller's work on cause-effect chains and the AskIgor debugger [Zeller, A., “Isolating cause-effect chains from computer programs,” International Symposium on the Foundations of Software Engineering, 2002, Charleston, S.C., 1-10] is a related diagnosis tool. However, Zeller's approach requires both a failed and successful execution of a program.
However, few of these have been shown to be usable, let alone to reduce debugging time. This is because debugging activity always begins with a question, and to use existing tools, programmers must struggle to map strategies for answering their question to the tools' limited capabilities. Furthermore, none of these tools support hypothesizing activities. If programmers have a weak hypothesis about the cause of a failure, any implicit assumptions about what did or did not happen at runtime will go unchecked. Not only do these unchecked assumptions cause debugging to take more time, but studies have shown that many errors are due to programmers' false assumptions in the hypotheses they formed while debugging existing errors.
In two studies of both experts' and novices' programming activity, programmers' questions at the time of failure were one of two types: “why did” questions, which assume the occurrence of an unexpected runtime action, and “why didn't” questions, which assume the absence of an expected runtime action. There were three possible answers:
1. False propositions. The programmer's assumption is false. The answer to “Why didn't this button's action happen?” may be that it did, but had no visible effect.
2. Invariants. The runtime action always happens (why did), or can never happen (why didn't). The answer to our button question may be that an event handler was not attached to an event, so it could never happen.
3. Data and control flow. A chain of runtime actions led to the program's output. For example, a conditional expression, which was supposed to fire the button's action, evaluated to false instead of true.
Therefore, the need exists for a new debugging technique which allows programmers to directly ask the questions they naturally want to ask and receive appropriate answers in response.