Spreadsheets are used in a variety of industries to organize, calculate, and present different types of information. For instance, spreadsheets have proven to be a great resource for capturing and organizing financial data so that it is easier for people to understand and/or manipulate. However, spreadsheets often contain errors in the data and/or the formulas contained in the cells of the spreadsheet. With respect to financial data, these errors can lead to significant financial loss.
Existing techniques for detecting errors in spreadsheets include: (i) explicitly defining a fixed set of rules that check cells of a spreadsheet for common patterns of errors, (ii) using a programming language to infer types of content in cells of a spreadsheet and alerting a user of possible type violations, (iii) applying software engineering metrics to underlying source code of a spreadsheet to identify a symptom in the source code that indicates a deeper problem (e.g., identifying a code “smell” associated with a spreadsheet), and (iv) checking whether a specific data value in a cell of a spreadsheet is an outlier. However, the existing techniques mentioned above fail to effectively handle false positives. That is, the existing techniques often detect a possible error even though the data and/or the formula contained in the cell is what a user intended it to be.