Spreadsheet languages, which are also known as form-based languages in some of the research literature, provide a declarative approach to programming, characterized by a dependence-driven, direct-manipulation working model (Ambler92). Users of spreadsheet languages create cells, and define formulas for those cells. These formulas reference values contained in other cells and use them in calculations. When a cell's formula is defined, the underlying evaluation engine calculates the cell's value and those of other affected cells (at least those that are visible to the user), and displays new results.
Spreadsheet languages include, as a subclass, commercial spreadsheet systems. These systems are widely used by end users, for a variety of computational tasks. The spreadsheet paradigm is also a subject of ongoing research. For example, there is research into using spreadsheet languages for matrix manipulation problems (Vichstaedt92), for providing steerable simulation environments for scientists (Burnett94), for high-quality visualizations of complex data (Chi97), and for specifying full-featured GUIs (Myers91).
Despite the end-user appeal of spreadsheet languages and the perceived simplicity of the paradigm, research shows that spreadsheets often contain faults. For example, in an early spreadsheet study, 44% of “finished” spreadsheets still had errors (Brown87). A more recent survey of other such studies reported errors in 38% to 77% of spreadsheets at a similar stage (Panko96). Of perhaps even greater concern, this survey also includes studies of “production” spreadsheets, those actually in use for day-to-ay decision-making, and these results ranged from 10.7% to 90% having errors. A possible factor in this problem is the unwarranted confidence creators of spreadsheets seem to have in the reliability of those spreadsheets (Wilcox97).
In spite of this evidence, no discussion is found in the research literature of techniques for testing spreadsheets. In fact, there has been only a little work on testing in other paradigms that follow declarative models. In the domain of functional and dataflow programming, there has been work on specification-based testing (e.g., (Kuhn97, Ouabd95)), but creators of spreadsheets rarely employ formal specifications. There has also been some recent research (Azem93, Belli95, Luo92) that considers problems of testing and reliability determination for logic programs written in Prolog. The logic paradigm is like the spreadsheet paradigm in that both are declarative, but several features of the logic paradigm, such as the bidirectional nature of unification and backtracking after failure, are so different from the spreadsheet paradigm that the testing techniques developed for Prolog cannot be applied to the spreadsheet paradigm.
On the other hand, there has been extensive research on testing imperative programs (e.g., (Clarke89, Duesterwald92, Frank193, Frank88, Harrold88, Hutchins94, Laski83, Ntafos84, Offutt96, Perry90, Rapps85, Rothermel97a, Weyuker86, Weyuker93, Wong95)), and it is in this body of work that the methodology presented in this document has its roots. However, significant differences exist between the spreadsheet and imperative programming paradigms, and these differences have important ramifications for testing methodologies. These differences can be divided into three classes.
The first class pertains to evaluation models. Evaluation of spreadsheets is driven by data dependencies that exist between cells, and spreadsheets contain explicit control flow only within cell formulas. Thus, spreadsheets are more appropriately tested using adequacy criteria that are data-dependence-based than criteria that are strictly control-flow-based. The dependence-driven evaluation model also implies that evaluation engines have flexibility in the scheduling algorithms and optimization devices they might employ to perform computations. A methodology for testing spreadsheets must be compatible with such mechanisms, and not rely upon particular evaluation orders or prevent optimizations based on value caching.
The second class of differences pertains to interactivity: spreadsheet systems are characterized by incremental visual feedback that is intertwined with the program construction process. The most widely-seen example of this is the “automatic recalculation” feature. This incremental visual feedback makes desirable use of testing methodologies that support an incremental input and validation process. For example, when a user changes a formula, the testing subsystem should provide feedback about how this affects the “testedness” of each visible portion of the program. This raises the issue of dealing with evolving spreadsheets while maintaining suitable response time.
The third class of differences pertains to users of spreadsheet languages. Imperative languages are most commonly used by professional programmers who are in the business of producing software. These programmers can be expected to know something about testing, and to place a high priority on doing a reasonably good job of testing. On the other hand, spreadsheet systems are used by a variety of users, many of whom are not professional programmers and have no interest in learning about formal testing methodologies. The goal is to provide at least some of the benefits of formal testing methodologies to these users.
Accordingly, a need remains for a suitable methodology for testing spreadsheets.