The present invention relates to electronic data processing, and more specifically concerns new query operations for the manipulation of tables in relational databases and similar types of data-management software.
A database is a collection of data in an organized structure. A typical database is stored in a computer as a set of records each having a number of fields for holding data items of a particular kind, such as character strings, numbers, or pointers to data located somewhere else. A relational database comprises any number of rectangular tables. Each table has a set of records; each record is referred to as a row of its table. Each record in the same table has the same number of fields. (However, some fields in a record may hold no data, indicated by a NULL value.) The fields of a table form a set of columns, which may have designated names that are not part of the data itself. The records do not have external to identify them individually. Instead, they are accessed by a key consisting of the contents of some combination of the fields; that is, a relational database may be considered to be a software-implemented content-addressable memory.
A database management system (DBMS, or database system) is computer software for storing, maintaining, and searching the data in a database. A DBMS usually includes facilities for increasing performance, reliability, and integrity, such as indexes, logging, and record locking. It always includes one or more interfaces for finding particular data from the database and for presenting these queries to a search engine. The engine searches the database and returns to the user a result, usually in the form of a relational table, which matches the specifications of the query.
The most widespread interface for relational databases is Structured Query Language (SQL). Although many variants of this interface language exist, standard versions have been defined by the American National Standards Institute (ANSI) and the International Standards Organization (ISO). Most present commercial realizations of SQL follow these standard versions, although many of them include language constructs in addition to those defined in the standard, or at different levels of compliance.
Relational databases and relational query languages treat data as a set of rectangular tables. Yet many databases are conceptually multidimensional, based upon axes such as time {day, month, year}, locale {store, city, state}, category {product, product_group}, actor {clerk, department, division}, payment {cash, check, credit}, and so forth. A user often finds it useful to think of such data as a collection of collections, and may wish to view them from different perspectives. In the above example, one perspective is a collection of records, where each record represents a locale, and contains a collection of monthly sales data for that locale; another perspective sees a collection of records (i.e., rows of a table) where each denotes a particular point in time, and the fields of each record (i.e., the columns of the table) collect sales figures for the different categories.
From this point of view, the ability to transform a database table from one perspective to another--to rotate the dimensions of the data--would be a valuable addition to the conventional capabilities of a query language such as SQL. In this context, to rotate perspectives or dimensions means to interchange a dimension represented in a table as a set of columns with a dimension represented as a set of rows. Conventional relational DBMS products and standards offer no direct operation for rotating perspectives. Although it is possible to formulate SQL queries to achieve this effect indirectly, such queries are large, complex, error-prone, slow, and hard to optimize into efficient execution plans, even when parallel processing is available.
Some conventional spreadsheet software allows a user to interchange data in a user-selected rectangle of cells to be interchanged in the same way that a matrix-algebra "transpose" operation relocates a matrix element a.sub.ij to a.sub.ji. In the Pivot Table feature of the Excel.RTM. spreadsheet from Microsoft Corp., for example, a user selects a rectangle of cells, copies it into a temporary clipboard, points to a destination cell, and performs a "paste special" operation after selecting "transpose" from an options menu. With a suite of compatible application programs such as the Microsoft Office.RTM. suite of office programs from Microsoft Corp., a user may even select data from a database table in the Microsoft Access.RTM. database component of Microsoft Office, transfer it as a single object to the Excel component as a rectangle of spreadsheet cells, transpose the cells, then transfer the cells back into the Access database as a collection of records in the transposed format.
Transposing data items in this manner is both clumsy and functionally limited. Even for small databases, the invocation of another application program merely to carry out a single query is wasteful. For large databases, the conventional requirement that transposed data reside in memory renders this method impossible. For client/server architectures using host-based search engines, there is no way to connect to a spreadsheet program for performing the operation. In any environment, transposition via spreadsheet requires manual intervention, and thus does not permit a transposition to form an internal part of a query within a database program. Such external operations cannot participate in the sophisticated reformulation, rewriting, and other optimization procedures of conventional database-query processors and other search engines. On a more conceptual level, fundamental differences between spreadsheets and relational database tables prohibit the desired types of transposition. For example, the names of the columns or fields in a database table are not a part of the table itself; they do not form a record of the table in the way that column headings in a spreadsheet are a row of cells within the spreadsheet. Transposing a rectangle of cells in a spreadsheet thus cannot transform a column of cells into the names of columns when the spreadsheet rows return to the database program as records in a table.
In the Microsoft SQL Server.RTM. database management system from Microsoft Corp., the execution engine has a strictly internal operation for splitting each item of a table update having the form (row_identifier, old_values, new_values) within a stream of update items into a "delete item" and an "insert item" which interchanges certain row and column values, and a similar operation for collapsing a "delete item" and an "insert item" into an "update item". These operations are not available to users and cannot participate in user queries. That is, the query processor uses them internally only for facilitating the efficient execution of certain functions performed while updating databases.
Thus, the database art could be significantly expanded by providing a facility for fast, efficient rotation of perspectives, especially for relational databases. Moreover, there is a need for rotation or transposition operations whose semantics and syntax integrate well into query languages such as SQL as natural extensions, and which can be optimized and executed in conventionally organized database query processors and other search engines without adding complex or idiosyncratic facilities.