When gathering variables to assemble a Dataset, some techniques have utilized a data preprocessor for transforming raw data into usable data. The typical preprocessing operation utilized a first data set referred to as the original or "raw" data values and the list of transforms that have been applied to the data. This therefore produces transformed variable values. These transformed variables can include variables that are unchanged from the raw values, variables whose raw values have been modified by the transforms, and newly created variables generated by the transforms.
A typical Dataset consists of three types of variables, raw variables, computed variables and independent variables. A raw variable in most cases, is a process variable that was read into the Dataset from a data file. If a transform is applied to the raw variable, it is still considered to be a raw variable, but it has both raw and transformed values. A computed variable is a variable that was created by applying a transform that generates a new variable, this typically being something like an equation. An independent variable is a variable created by applying a transform that generates new values without referencing any existing variables in the Dataset. Examples of an independent variable are the generating of constants, row numbers or random numbers. A transform is an expression that specifies some modifications or "transform" of variables in a Dataset. The output or result of the expression can either be stored in place of an existing variable, replacing its values, or can be stored as a new variable. One can apply more than one transform to the same variables. All transforms on the Dataset are kept in one ordered list. The typical syntax of a transform is: EQU (Outputs)=Function(Inputs)
The internal architecture of any existing spreadsheet software is either equation-based or sequential conventional language based.
In an equation-based system, each variable (or column, or even a cell in some cases in a spreadsheet) has its own formula or list of formulas. The system keeps track of which columns reference other columns, such that the calculations in the overall spreadsheet are done in the correct order. For example, A and B already exist in the spreadsheet, and a new column is created with a relationship X=A+B. Then, anytime one changes the value of either A or B, X is automatically recalculated. No matter what value is attributed to either A or B, X always retains the relationship A+B. The user never has to be concerned in which order formulas are applied, or with maintaining the relationships among variables. Many commercial spreadsheets utilize an equation-based architecture.
On the other hand, in a sequential conventional programming language based architecture there is one ordered list of formulas for the entire spreadsheet, and the formulas are calculated in the order in which they appear in the list. If one enters the formula X=A+B, it only means that "add the values that A and B have at that moment and then compute X". If a user then sequentially enters the formula and then changes the value of A or B, then X is not affected.
The equation based architecture has problems in that, if the variables have different date/time references and the user wishes to change these date/time references, this presents a problem. Since this type of architecture does not maintain one ordered list for all the transforms, transforms that merge the time base of the two variables (transforms which have implicit dependencies) are not possible to utilize with this type of architecture. Additionally, transforms that are required to change the date/time references for variables cannot be expressed as equations, since these type of transforms are fundamentally sequence-based operations.
Sequential conventional programming language based architecture is substantially more powerful than the equation-based architecture and it does not have any problem handling variables on different date/time references. However, this type of architecture does require considerable attention and understanding from the user to insure that the formulas are entered in the correct order to produce the desired results.