1. Field of the Invention
The present invention relates to a method, system, program, and data structure for pivoting column fields in a database table and, in particular, pivoting fields in multiple columns of an input table into fewer fields in an output table.
2. Description of the Related Art
Data records in a computer database are maintained in tables, which are a collection of rows all having the same columns. Each column maintains information on a particular type of data for the data records which comprise the rows. A data warehouse is a large scale database including millions or billions of records defining business or other types of transactions or activities. Data warehouses contain a wide variety of data that present a coherent picture of business or organizational conditions over time. Various data analysis and mining tools are provided with the data warehouse to allow users to effectively analyze, manage and access large-scale databases to support management decision making. Data mining is the process of extracting valid and previously unknown information from large databases and using it to make crucial business decisions. In many real-world domains such as marketing analysis, financial analysis, fraud detection, etc, information extraction requires the cooperative use of several data mining operations and techniques. Further details of data warehousing and data transforms, are described in the IBM publications xe2x80x9cManaging Visual Warehouse, Version 3.1,xe2x80x9d IBM document no. GC26-8822-01 (IBM Copyright, January, 1998), which is incorporated herein by reference in its entirety.
Once the desired database tables have been selected and the data to be mined has been identified, transformations on the data may be necessary. In particular, many database tables may not have an optimal design for executing SQL queries. Many database programs, such as the IBM DB2 Universal Database program, provide numerous column functions, also referred to as set or aggregate functions. Column functions operate on a set of values (a column) and reduces a set of values in one or more columns from one or more rows to a single scalar value. Some column functions that perform operations on the rows in a column include average, count, maximum, minimum, standard deviation, sum, variance, etc. These column functions are particularly useful for performing statistical and other analysis on data in a column.
When an enterprise receives data in a spreadsheet or database format, the data may not be arranged in a table/column row format that is suited for application of column functions and other column based analysis, which is one of the more efficient types of SQL data analysis. For instance, data that the user may want to group together for applying column functions and other analysis may be spread out across different columns.
Thus, there is a need in the art for a method and system for transforming database tables in a manner that makes them more efficient to perform analysis.
To overcome the limitations in the prior art described above, preferred embodiments disclose a a method, system, and program for transforming at least one column in an input table. An input data table has multiple columns and rows. A determination is made of multiple data column names and multiple pivot columns in the input table: A row is generated in an output table for each data column name, having one data column name field including one data column name and at least one pivot column field including the value in one pivot column field in the input table, wherein each of the rows in the output table for each data column name has different values in at least one of the data column name field and pivot column field.
In further embodiments, each pivot column field in one row in the input table is copied to one pivot column field in one row in the output table. This causes the transfer of multiple pivot column fields in the input table to one pivot column in field in multiple rows in the output table.
In still further embodiments, for each pivot column field in the input table there is one row in the output table having one data column name field and pivot column field including the pivot column name and corresponding pivot column value in the input table, respectively.
In yet further embodiments, determining multiple pivot columns in the input table comprises determining at least one pivot group including at least one pivot column in the input table. In such case generating the at least one pivot column field in the output table row comprises generating a pivot group column in the output table row for each pivot group. Each pivot group column in the output table includes a value in one of the pivot column fields in the corresponding pivot group in the input table. This aggregates the input table pivot columns into one pivot group in one pivot group column in the output table.
Preferred embodiments provide a program for transforming aggregates of multiple column fields in an input table to multiple rows in one column of an output table. The preferred embodiments are particularly useful in situations where a database user wants to perform column functions, such as averaging, summing, maximum, standard deviation, etc., on values in fields that are spread over multiple columns of one or more rows. The SQL code needed to apply common column functions to fields spread across columns and rows could be quite cumbersome. Preferred embodiments pivot aggregates of fields dispersed through multiple columns and rows in an input table to fewer columns in an output table. By pivoting fields from multiple columns into a single column in the output table, typical database column function could be applied to the rows in the single column. This would allow the user to apply basic and straightforward SQL commands and column functions to perform the analysis of the fields.