This invention relates to interactive computer programs for doing quantitative analysis. It provides methods for the computation and interpretation of the numeric ranges of formula outputs.
Spreadsheet programs (such as Lotus 1-2-3) running on personal computers have become extremely popular for interactive numerical computations. They offer very convenient data entry and editing, and have many features for the interactive analysis of numerical data. It can display a matrix of cells. Each cell has a value based on some formula stored in the worksheet. The formula may just be a constant value, a sum of numbers in a specified column, or a more complicated mathematical expression depending on other cell values. A simple worksheet is shown in FIG. 1. The upper matrix shows the formulas and the lower matrix shows the values.
Much of the popularity of spreadsheet programs is based on their interactive nature. Any cell can be edited any time and the program automatically recalculates the dependent formulas and updates the cells values. Spreadsheets are very useful for what-if financial analysis. An analyst may wish to see the consequences of changes in data values. For example, he might wish to see how much sooner a loan will be paid off if he increases his monthly payment. Or he may wish to see how his income taxes may change if he makes various charitable contributions. In each case he can try several numbers and immediately see the effects.
A difficulty with spreadsheet programs is that they store numbers in a fixed precision, and have roundoff error in computations. For example, 1/3=0.3333 . . . is a repeated decimal but only sixteen or so digits can be reasonably stored. If only four digits are stored and that is then multiplied by 3, the result is 0.9999 rather than 1.0000 for a roundoff error of 0.0001. Complicating matters further is that spreadsheet ordinarily use a binary representation internally but display numbers in decimal form, so there are additional roundoff errors in the binary/decimal conversions. These are even harder for the user to understand because he is used to thinking only in decimal. For example, 10.sup.* 0.1 may not be exactly 1.0, because 0.1 is stored inexactly in binary.
These roundoff errors may seem relatively small, and they are, but a subsequent computation can increase its significance. For example, if cell A1 has the value 0.1 then the formula 1/(1-10.sup.* A1) could have a positive, negative, or infinite value depending on rounding subtleties. There is a total loss of significance.
Even if the user understood all these roundoff issues, it would still be difficult for him to predict the effect of roundoff errors. The chain of formulas linking the input to the output may not be apparent and is derived internally by the spreadsheet. The spreadsheet internally finds an ordering of the formulas, called natural order, which is based on the dependencies inherent in the formulas and not on their layout. (Pardo describes one way of finding a natural order in U.S. Pat. No. 4,398,249.)
Thus, when a spreadsheet user looks a numerical result computed by a spreadsheet, he has no idea how many of the digits are correct.
There are other types of numerical errors that occur in spreadsheets. There are arithmetic overflows, such as 10 10000, where the result would be outside the range of representable numbers. There are also mathematically undefined expressions, such as 0/0. Various other errors include function domain errors, such as the square root of -1, and references to cells which are undefined or unavailable.
In the prior art, numeric spreadsheet errors are usually lumped into two categories, labelled ERR and NA. NA is used for cells which are not available for some reason, and for formulas which depend on those cells. ERR is used for all the overflow and domain errors.
Another type of numeric error is the user error. The input numbers may be based on measurement or some other imperfect process, and hence susceptible to error. Such errors may be large or small, and it would seem that spreadsheets would be unable to handle them.
Thus spreadsheet computations have many sources of errors. These may be categorized as measurement, representational, computational, and domain errors.
Another drawback to spreadsheets is that doing a what-if analysis can be rather tedious. Often the user wants to know what the range of outputs will be if inputs are allowed to vary in a particular range. This is usually a trial-and-error process.
Although spreadsheets have no good way of dealing with roundoff error and indeterminate inputs, there are methods in the prior art which have been used in other contexts. One method is the use of probability distributions. This method uses a model where numbers are replaced by a probability distribution on the real line.
A variant of the probability model is the Monte Carlo model. The computations of probability distributions are too complicated, so they are simulated using random numbers. A drawback is that the simulation requires many repeated calculations. Using a small number of repetitions, like 3, means that bad luck can give a vary misleading answer. A large number, like 30, requires an unacceptable slowdown in recalculation time.
Another method is to use binary coded decimal (bcd) arithmetic. Cells values are stored as decimal numbers, so 0.1 is exact. This method eliminates rounding errors when adding up pennies, but does not address other kinds of errors.
Another method is the interval arithmetic model. Each number is replaced with an interval, with the end points being lower and upper bounds for the set of possible values. An operation on an interval has to give an interval large enough to include all real numbers obtained by applying that operation to numbers in each operand interval. For example, let [a,b] denote the set of real numbers in the range from a to b, inclusive. Then EQU [-1,2].sup.* [1,3]=[-3,6]
because if x is between -1 and 2, and y is between 1 and 3, then x.sup.* y must be between -3 and 6.
Implementing interval arithmetic in the presence of roundoff error is tricky. For example, addition may be defined as EQU [a,b]+[c,d]=[a+c, b+d]
This is correct as long as a+c and b+d are calculated exactly. But if there is roundoff error, as there usually is, then a+c must be rounded down and b+d rounded up in order for [a+c,b+d] to be strictly correct.
These models, as well as others, have been known for many years, but are not widely used. They are not supported by any popular high level programming languages or microprocessors, are technically difficult to implement, and they are much slower than ordinary floating point arithmetic. Furthermore, they involve mathematical subtleties that are difficult to explain.
There is prior art on directed rounding. The IEEE 754 standard for binary arithmetic specifies that processors be able to round up or round down, as well as the more usual round-to-nearest. There are many floating point chips which implement this standard. The standard also specifies binary floating point data formats including infinities and nans. (Nan is a shorthand for not-a-number.) Certain operations which cause overflows or other potentially misleading result generate exceptions, which may be masked or unmasked. An unmasked exception interrupts the processor immediately for special action. A masked exception merely sets a flag which can be queried later and produces an infinity or nan.
In the prior art, spreadsheet formulas are ordinarily translated into a polish form, also called p-code. P-code has the advantages that an interpreter can evaluate it efficiently without parsing, and that the original formula is easily reconstructed from the p-code. While it may seem more efficient to compile into the native code of the processor, it is not at all obvious how to achieve significant performance gains. How to compile formulas within programming languages is well-known to the experts, but spreadsheets are quite different because of their interactive nature, untyped cells, nonsequential execution, unscoped variables, dynamic memory management, multiple data types, and error propagation. Furthermore, it is not at all obvious how to make compiled formulas efficient with the sophisticated error handling of this invention.