Spreadsheet software or applications are relatively well-known and useful tools. Typical spreadsheet applications simulate physical spreadsheets by capturing, displaying, and manipulating data arranged in rows and columns. At the intersection of columns and rows are cells. Cells within spreadsheets can contain: a number, a string, an error value, or a blank value. Furthermore, cells may also contain formulas that can operate on data in other cells and display the results. These formulas empower the user to create calculations and business logic that helps the user exploit the data.
The formulas import data from areas in the spreadsheet. To import the data into the operation, spreadsheets commonly designate a set of data on which the formula will operate. Unfortunately, current spreadsheets applications have some significant drawbacks when it comes to designating blocks of data for use in formulas.
In current spreadsheets, the designation of data can be done in a few different ways. First, users can employ “references” to cell “ranges.” A reference is a pointer to a set of data that uses the position, specified by the cell coordinates, of the data within the spreadsheet to designate the block of data. A cell range is one or more cells in a continuous block designated by its coordinates within the sheet. For example, in the Microsoft® Excel spreadsheet application, users may designate cells by the row number and the column letter. Users may designate a plurality of cells using an operator, like a colon, to indicate a range. The row numbers and column letters, with or without operators, function as the position coordinates for the data. Using a combination of row numbers, column letters, and operators, the user can designate a block of data within the spreadsheet. A typical formula in current spreadsheets might be “=AVERAGE(C6:C18),” which calculates the average of the data within a cell range from cell C6 to cell C18.
Next, users can employ a naming feature to refer to data having a “Name” or within a “Named Range.” The naming feature in some spreadsheet applications, such as the Microsoft® Excel spreadsheet application, allows users to create a name for a particular cell range or formula. In essence, the user chooses a cell range with certain coordinates. The user instructs the spreadsheet application to “name” that cell range. The spreadsheet application then links the name to the cell range having the certain coordinates. Once the user has created the name for the cell range, the user may use the name within formulas to refer to the data having the certain coordinates. For instance, a user can create a name called “Heights” for the cell range C6 to C18, and use that name to designate the cell range within a formula, such as “=AVERAGE(Heights).”
Next, a user can employ an Natural Language Formula (NLF). NLFs are a means of referring to data by the user-created labels located in cells near the data within the spreadsheet. A user can place a data label into a spreadsheet, such as a column or row header. The user can then use the label within a formula. The spreadsheet application determines which cells, in the spreadsheet, are described by the label, and resolves the described cells into a cell range to use in the formula. Thus, if a user had the values “Height”, “26”, “30”, “22”, “27”, etc. in cells C5:C18, the user could refer to “Height” within a formula. The spreadsheet application could resolve the required cell range to be cells C6 to C18.
Finally, a user can employ reference “functions.” The reference functions are not references, but processes that return a reference to data or a specified piece of data. The operation of the process to return the reference is adjusted or changed according to a set of arguments a user may enter when inputting the function. The arguments allow the spreadsheet application to determine which cells to return.
Two examples of the functions that return references are the “OFFSET” function and the “GETPIVOTDATA” function, offered in the Microsoft® Excel spreadsheet application. The OFFSET function returns a reference to a range that is a specified number of rows and columns from a cell or range of cells. The returned reference from the OFFSET function can be a single cell or a range of cells. The OFFSET function allows the user to enter arguments that specify the number of rows and the number of columns to be returned.
The “GETPIVOTDATA” function, on the other hand, is intended to make it easier for the user to refer to a single cell of data in a PivotTable® table. The data returned from the GETPIVOTDATA function doesn't change when the data, upon which the PivotTable® table is based, is refreshed (updated to incorporate new data values). The GETPIVOTDATA function finds data in a single cell by using the metadata of the PivotTable® table. Metadata are the “names” of the “items” and “fields” that correspond to the data, as opposed to the coordinates of the data. An “item” is computer-generated label within the PivotTable table that is similar to a row header or column header in a table. A “field” is a computer-generated label that may incorporate and/or describe one or more items. The field is similar to a header in a table that covers several rows or columns of data. As an example of the GETPIVOTDATA function, the user can ask for the data related to the height of John in a PivotTable® table by typing in the formula: “=GETPIVOTDATA(“Height”,Sheet1!$A$3,“Person”,“John”)” where “John” is an item in the field named “Person.” The GETPIVOTDATA function formula allows the data in the PivotTable® table to move to different cells without destroying the returned reference because the returned reference is based on the metadata of the cell and not the cell's location.
Significant drawbacks exist for all of the above methods of designating data in spreadsheet applications. All of the above methods are incapable of adjusting the designations to include added data because each of the methods are unable to interpret the user's intention to include the added data into a designation. When the cell range containing the user's data changes due to the addition of new data, formulas using the above methods for designating data must be manually changed to reflect the cell range change. If the designations are not updated, errors are introduced because the formulas no longer includes the necessary cells of data.
For example, if the user desired to obtain the average of data currently in column C, the user may enter the formula “=AVERAGE(C6:C18),” where all the data in column C is located in cells C6 to C18. The user would expect the formula to return the average of the data in column C. Now, if the user, either manually or by some operation, added data to column C in cells C19 to C22, the user may still expect the entered formula to return the average of all data in column C. Unfortunately, the formula would not operate as the user expects because the cell range in the formula would still only average the data within the cells C6 to C18, and thus, leave out the data in cells C19 to C22. These situations require the user to manually change the coordinates of the cell ranges in the formulas because spreadsheet applications cannot determine whether the user's expectation is always to include the new data in the reference. Thus, the user must manually edit the formula to read “=AVERAGE(C6:C22).” This editing is difficult or impossible when the user has numerous formulas with data that changes frequently. All types of references currently available (including references to cells by their coordinates, named ranges, NLFs, and functions that return references) suffer from the inability to update the cell range.
Further, an additional drawback of the naming feature, described above, is that the user must manually create and maintain each “Named Range.” For example, a user must first create a list of heights in cells C6 to C18 and then create a named range called “Heights,” which designates the data. Unfortunately, when the user updates the data, for example, if the user adds data in cells C19 to C22, the user must manually update the named range to include all cells from C6 to C22.
Still further, NLFs help with the some of the problems with named ranges, but introduce other problems. NLFs are often ambiguous. If two labels, within the spreadsheet, have the same name or if a label may describe data in a column and a row, the NLF may designate a cell range that differs from what the user would expect. For example, the NLF reference may point to the other cell range with the second label or to the row of data rather than the column of data as the user expects.
In addition, the association between the label input by the user and the cell range selected by the spreadsheet application is inferred at the time that the NLF (and not the label or data) is created or altered. Thus, a label used in some NLFs may point to one cell range while the same label used in other NLFs may point to a totally different cell range. For example, if the user entered the formula “=AVERAGE(Heights),” where the label “Heights” was in cell C5 and data was located in cells C6 to C18 (with no data in cell C19), the user would expect the formula to return the average of the data in the column that begins with the word “Heights.” Now, if the user, either manually or by some operation, added data to column C, in cells C19 to C22, the user may still expect “Heights” to refer to all data in the column that starts with “Heights.” Unfortunately, the formula would not function as the user expects because the cell range in the formula would still only average the data within the cells C6 to C18, and thus, leave out the data in cells C19 to C22. However, if the user were to enter exactly the same formula in another cell after adding the data in cells C19 to C22, the user would average the data in cells C6 to C22. Thus, the user can have two formulas with NLFs that look exactly the same, but one would return the correct result (equivalent to “=AVERAGE(C6:C22)”) while the other would result in an incorrect result (equivalent to “=AVERAGE(C6:C18)”). Thus, the user employing NLFs has the problem of not clearly knowing exactly what cells to which a formula refers.
Still further, functions that return references, like OFFSET and GETPIVOTDATA, have some other limitations. Functions that return references do not adjust to changes in the data in a spreadsheet. For instance, the functions that return references do not adjust when items are renamed or as items are moved. For example, a GETPIVOTTABLE function that returns a reference for the height of John in the following manner: “=GETPIVOTDATA(“Height”,Sheet1!$A$3,“Person”,“John”)” will not update if you change the name of that person from “John” to “Fred.”
Similarly, an OFFSET function may return a reference to a desired cell by finding a cell designated in the arguments and moving a number of rows and columns from the designated cell to the desired cell. If rows or columns are inserted or deleted between the designated cell and the desired cell or if the designated cell or the desired cell moves, the OFFSET function will return the incorrect reference. The OFFSET function cannot adjust the number of rows and columns moved in the spreadsheet to find the desired cell.
The GETPIVOTDATA function also can return a reference only to a single cell. For a user to refer to multiple cells, the user must enter many GETPIVOTDATA functions, which can be unwieldy. For example, the user may desire to average the data in column C of the results from a PivotTable® table. The data in column C may reside in cells C6 to C18. To use the GETPIVOTDATA function references that adjust if the data moves during a PivotTable® table refresh, the user must enter one GETPIVOTDATA function for every cell from C6 to C18. The user must create a string of arguments that might look like:“=AVERAGE(GETPIVOTDATA(“Height”,Sheet1!$A$3,“Person”,“John”),GETPIVOTDATA(“Height”,Sheet1!$A$3,“Person”,“Pam”),GETPIVOTDATA(“Height”,Sheet 1!$A$3,“Person”,“Bill”),GETPIVOTDATA(“Height”,Sheet1!$A$3,“Person”,“Fred”).”The formulas using functions that return references become extremely long and dense as the amount of data referenced grows. Creating and editing these formulas with functions that return references is tedious and time-consuming due to the length of the formulas