1. Field of the Invention
The present invention relates to a method, system, program, and data structure for inverting rows and columns in a database table.
2. Description of the Related Art
Data records in a computer database are maintained in tables, which are a collection of rows all having the same columns. Each column maintains information on a particular type of data for the data records which comprise the rows. A data warehouse is a large scale database including millions or billions of records defining business or other types of transactions or activities. Data warehouses contain a wide variety of data that present a coherent picture of business or organizational conditions over time. Various data analysis and mining tools are provided with the data warehouse to allow users to effectively analyze, manage and access large-scale databases to support management decision making. Data mining is the process of extracting valid and previously unknown information from large databases and using it to make crucial business decisions. In many real-world domains such as marketing analysis, financial analysis, fraud detection, etc, information extraction requires the cooperative use of several data mining operations and techniques. Further details of data warehousing and data transforms, are described in the IBM publications xe2x80x9cManaging Visual Warehouse, Version 3.1,xe2x80x9d IBM document no. GC26-8822-01 (IBM Copyright, January, 1998), which is incorporated herein by reference in its entirety.
Once the desired database tables have been selected and the data to be mined has been identified, transformations on the data may be necessary. In particular, many database tables may not have an optimal design for executing SQL queries. Many database programs, such as the IBM DB2 Universal Database program, provide numerous column functions, also referred to as set or aggregate functions. Column functions operate on a set of values (a column) and reduces a set of values in one or more columns from one or more rows to a single scalar value. Some column functions that perform operations on the rows in a column include average, count, maximum, minimum, standard deviation, sum, variance, etc. These column functions are particularly useful for performing statistical and other analysis on data in a column.
When an enterprise receives data in a spreadsheet or database format, the data may not be arranged in a table/column row format that is suited for application of column functions and other column based analysis, which is one of the more efficient types of SQL data analysis. For instance, data that the user may want to group together for applying column functions and other analysis may be spread out across different columns.
Thus, there is a need in the art for a method and system for transforming database tables in a manner that makes them more efficient to perform analysis.
To overcome the limitations in the prior art described above, preferred embodiments disclose a method, system, and program for transforming an input table. The input data table is comprised of multiple columns and rows. For each column, the data from one column in the input table is copied into one row in an output table to invert the input table.
In further embodiments, an input column name parameter indicates a column in the input table. The data in the indicated column is copied to the first row of the output table to form the column names in the output table. Further, the data for each column in the input table excluding the indicated column is copied into one row in the output table following the first row.
In yet further embodiments, a determination is made as to whether the data in the input table columns inverted into the rows in the output table form at least one output table column having a first and second data types, wherein the first and second data types are different. A determination is made as to whether the first and second data types in the output table column are in a promotion path, such that the first data type is capable of being promoted to the second data type. The data in the output table column having the first data type is promoted to the second data type if the first data type is capable of being promoted to the second data type.
Preferred embodiments provide a program for inverting the rows of an input table into columns in an output table. The preferred embodiments are particularly useful in situations where a database user wants to perform column functions, such as averaging, summing, maximum, standard deviation, etc., on values in fields that are spread over a operational changes may be made without departing from the scope of the present invention.
FIG. 1 illustrates a computing environment 2 in which preferred embodiments are implemented. The environment 2 includes a server 4 and client 6. The server 4 and client 6 would include an operating system, such as MICROSOFT WINDOWS 98 and WINDOWS NT, AIX, OS/390, OS/400, OS/2, and SUN SOLARIS,** and may be comprised of any suitable server and client architecture known in the art. The server 4 and client 6 include a database program 8a and 8b, wherein 8a comprises the server 4 side of the database program and 8b comprises the client 6 side. The server 4 and client 6 may communicate via any communication means known in the art, such as a telephone line, dedicated cable or network line, etc, using any protocol known in the art including TCP/IP network (e.g., an Intranet, the Internet), LAN, Ethernet, WAN, System Area Network (SAN), Token Ring, etc. Alternatively, there may be separate and different networks between the servers 4 and client 6.
The client/server database programs 8a, b, may be comprised of any client/server database program known in the art, such as DB2, Oracle Corporation""s ORACLE 8, Microsoft SQL Server,** etc. The database programs 8a and 8b are used to access operations and perform operations with respect to information maintained in one or more databases 10. The database(s) 10 would consist of multiple tables having rows and columns of data, e.g., tables 14 and 18. Further details of the architecture and operation of a database program are described in the IBM publications xe2x80x9cDB2 for OS/390: Administration Guide, Version 5xe2x80x9d IBM document no. SC26-8957-01 (Copyright IBM. Corp., June, 1997) and xe2x80x9cA Complete Guide to DB2 Universal Database,xe2x80x9d by Don Chamberlin (1998), which publications are incorporated herein by reference in its entirety.
**Microsoft, Windows, Windows NT are registered trademarks and Microsoft SQL Server is a trademark of Microsoft Corporation; DB2, AIX, OS/390, OS/400, and OS/2 are registered trademarks of IBM; and Oracle8 is a trademark of Oracle Corporation; and Solaris is a trademark of Sun Microsystem, Inc. single row. The SQL code needed to apply common column functions to fields in the same row could be quite cumbersome. Preferred embodiments invert the data in rows in an input table to a single column in an output table. By inverting the rows to columns, typical database column function can now be applied to the data in a single column, that was previously spread across different columns in the same row. This would allow the user to apply basic and straightforward SQL commands and column functions to perform the analysis of the fields.