1. Field of the Invention
The present invention relates to databases.
2. Description of the Related Art
Relational database management systems (RDBMS) have become the repository of choice for data, owing to the ease with which data can be extracted in response to a particular user-generated query. The query language typically used in extracting data from RDBMS is SQL, the properties of which advantageously simplify the data retrieval.
SQL has been designed under the assumption that RDBMS will store data horizontally, that is, in tables in which each row represents an object, also referred to as a tuple, and each column represents an attribute of the tuple. To obtain data as desired from the database, an SQL query is generated, optimized automatically by the RDBMS, and then executed, typically by joining two or more tables using join predicates, with the sought-after data being returned to the user.
As understood by the present invention, while conventional RDBMS that use horizontal tables are effective for many of their intended purposes, in certain emerging applications, notably e-commerce, the horizontal scheme can be inadequate for several reasons. For instance, an e-marketplace for electronics might require a database storing data on parts from 1000 manufacturers and distributors, and might contain two million parts classified into 2000 categories. New suppliers might join the marketplace on a regular basis, meaning that the database must constantly evolve. Unfortunately, horizontal databases are typically limited in the number of columns that are allowed, e.g., to 1012 columns, but as recognized herein e-marketplaces such as the one postulated above might require 5000 or more attributes spread across different categories and, hence 5000 or more columns. Even if a horizontal database were expanded to allow many more columns, most of the database fields would have nulls in them (thus rendering a xe2x80x9csparsexe2x80x9d database), since most parts would have far fewer than the maximum number of possible attributes. As understood herein, in addition to increasing storage overhead, nulls increase the size of the database index and are preferentially highly sorted in many RDBMS tree indices, despite the fact that nulls convey no meaningful information. Such xe2x80x9csparsityxe2x80x9d also degrades performance when data records are very wide in terms of the number of columns but only a few columns are required for a query. Further, using a horizontal system in the above-postulated scenario, the present invention recognizes that the parts table would require frequent and expensive altering to accommodate new parts and categories.
Accordingly, the present invention has understood that a different data representation is indicated, and more specifically, that a ternary (3-column) xe2x80x9cverticalxe2x80x9d representation is better suited to applications such as the e-commerce application discussed above. In a ternary vertical table, only a few columns, e.g., three, are used, with the first column representing an object identification and two more columns representing attribute names and attribute values, respectively, for the objects. Thus, the vertical table contains no nulls. Schema evolution becomes trivial, since as new objects (e.g., electronic parts) are added to the marketplace, new tuples are added to the vertical tables.
As recognized herein, however, storing data in vertical format raises complications. First, recall that SQL is designed for horizontal storage formats, and as a consequence writing SQL queries against vertical tables requires a level of expertise in SQL not possessed by most users. Even in the hands of an expert, tailoring an SQL query for a vertical table is cumbersome and error-prone. Additionally, current application development tools are designed for horizontal data formats; they simply will not function when used in connection with vertical tables.
The present invention has recognized the above-noted problems and provides solutions to one or more of them as disclosed below.
To address one or more of the above-noted problems, a logical horizontal view is provided on top of an underlying vertical database, with the user, e.g., a person or an application development tool, writing conventional SQL queries against the logical horizontal view and with the queries then being translated transparently to the user for execution against the underlying vertical tables.
A general purpose computer is programmed according to the inventive steps herein. The invention can also be embodied as an article of manufacturexe2x80x94a machine componentxe2x80x94that is used by a digital processing apparatus and which tangibly embodies a program of instructions that are executable by the digital processing apparatus to execute the present logic. This invention is realized in a critical machine component that causes a digital processing apparatus to perform the inventive method steps herein.
Accordingly, a general purpose computer is coupled to a relational database system that is characterized by at least one vertical table. The computer logic defines a logical horizontal view over the vertical table, and receives at least one SQL query against the horizontal view. The query is transformed to render a transformed query, with the transformed query being executed against the vertical table to generate an output.
In a preferred embodiment, the query is transformed using at least one operator that receives the vertical table and outputs the logical horizontal table having column labels equal to the attribute names in the vertical table. As set forth further below, the operator is a v2h operator. More particularly, the preferred operator executes a left outer join of a projection of the object identifications of the vertical table with a summation left outer join of a projection of attribute values from the vertical table.
The query transformation can include any standard relational operator such as executing a projection based on the vertical table, or a selection from the vertical table, or a table join using the vertical table, or a cross product, union, or intersection. Also, the transforming act can include executing the operator on the vertical table to render a result and then undertaking a desired set operation on the result. As intimated above, a horizontal to vertical operator can be executed against the output to transform the output to a vertical format.
In another aspect, a computer program device includes a computer program storage device that is readable by a digital processing apparatus. A program is on the program storage device. The program includes instructions that can be executed by the digital processing apparatus for querying a vertical table in a database system. The program includes computer readable code means for transforming a horizontal-based SQL query into a transformed query that has a format for execution against the vertical table.
In yet another aspect, a method for extracting data from a vertical table in a database includes defining an enablement layer which includes a horizontal view representative of the vertical table. The method also includes extracting data from the database based on an SQL query using the enablement layer, without requiring a user to tailor the query to a vertical format.
The details of the present invention, both as to its structure and operation, can best be understood in reference to the accompanying drawings, in which like reference numerals refer to like parts, and in which: