Computer systems store and manage many business-related numerical values in the form of tabular data structures. The term “numerical table” is used herein to refer to a tabular dataset that holds such numerical values. Typical numerical tables are formed from a numerical value management area and one or more header areas. The numerical value management area stores numerical values being managed, and the header areas describe of what items these numerical values are. In other words, the numerical value management area stores numerical values of items indicated by the header areas.
Numerical values stored in numerical tables may be used for the purpose of data mining. For example, a data mining process may take place to analyze many numerical tables that accumulate a large amount of sales records of specific products, so that some statistics (e.g., trend of sales) can be obtained. In this case, the process involves the task of demarcating header areas and numerical value management areas in each source numerical table. Such demarcation tasks may be done by human workers, but this solution is not realistic when there are a large number of numerical tables.
In view of the above, several techniques are proposed for automatically determining areas in a numerical table. For example, one proposed method produces documents as an output of document data stored in a database in tabular form. A document generation device implementing this method is also proposed. Another proposed method enables automatic extraction of relational data from spreadsheets.
Numerical tables may be analyzed in some other situations to find an area that satisfies specified conditions about numerical values in the tables. In this connection, one proposed method is designed to obtain an area that maximizes the sum of numerical values within a given array.
See, for example, the following documents:
Japanese Laid-open Patent Publication No. 11-175641
Zhe Chen, Michael Cafarella, “Automatic web spreadsheet data extraction”, SS@'13 Proceedings of the 3rd International Workshop on Semantic Search Over the Web, ACM, 2013-08-30
Kuan-Yu Chen, Kun-Mao Chao, “On the range maximum-sum segment query problem”, Discrete Applied Mathematics, 1 Oct. 2007, Volume 155, Issue 16, Pages 2043-2052
Most numerical tables use their numerical value management areas for the sole purpose of containing numerical values, but some numerical tables include other kinds of data in their counterparts. Similarly, the header areas may include numerical values as well. The conventional techniques are, however, unable to extract numerical value management areas properly if some of their cells contain character strings. The conventional techniques may also fail to extract header areas properly if some of their cells contain numerical values.