Spreadsheet applications are used by billions of users worldwide. The appeal of spreadsheet systems is two-fold. On the one hand, they can offer a flexible platform with almost no restrictions on the data representation. For example, users can arrange data in ad hoc layouts. An example representation of data is provided in the hierarchical table shown in FIG. 1. In FIG. 1, the layout of the table provides an intuitive comparison of frequency distributions across different years and insurance types. On the other hand, spreadsheet systems can also offer tools to operate on the data, such as sorting, filtering, querying, and charting. These features can improve the extraction and analysis of the essential information, which is of immense importance today when data mining is ubiquitous. Unfortunately, tools provided by a spreadsheet application or system are limited to strictly-formed relational tables, such as the example provided in FIG. 2C.
While a user can manually transform an ad hoc spreadsheet tables to a relational table, it is a time consuming and error-prone process, which may even result in removing essential relations or data. Furthermore, a manual transformation of a hierarchical table to a relational table is simply not realistic with tables that are large (e.g., at least 25 rows and/or columns). Given these problems, previous attempts have resulted in a few different approaches. One approach utilizes data cleaning tools that manipulate spreadsheet tables through predefined transformations. While such tools can provide a simplistic user interface, the user has to identify and select the transformations to be performed. Furthermore, the generality of the transformations and the fact that multiple instructions are required makes this task harder for the users, resulting in users failing to complete this task
Another approach utilizes Domain-specific Languages (DSLs), which can offer syntax specialized for manipulating tables. While such DSLs can shorten the data cleaning process, it requires an end user to learn a programming language in order to utilize the DSL. This is a complex task for most end users. Another approach is to utilize automatic tools with additional assumptions as in the program. One example of this approach is FlashRelate, which assumes that the user can convey how to change the layout of the table with few examples. Another example of this approach is Senbazuru, which assumes that headlines are data pieces and that values not related. Thus, Senbazuru requires identification of the headline region to compute the relational tuples.
The above approaches, other than Senbazuru, can place the burden of understanding how to remove the hierarchy on the user. Senbazuru may eliminate some of this burden, but it only addresses a specific class of tables. Tables with related values or headlines that do not provide data cannot be transformed by Senbazuru. In practice, there are many such tables utilized by end users or database systems. Thus, there is a need to provide a technique that can transform a hierarchical table to a relational table that (1) is seamless to an end user (e.g., can be performed at the click of a button), and (2) is capable of operating on complex tables with potentially large data sets.