1. Field of the Invention
The present invention relates to databases.
2. Description of the Related Art
Relational database management systems (RDBMS) have become the repository of choice for data, owing to the ease with which data can be extracted in response to a particular user-generated query. The query language typically used in extracting data from RDBMS is SQL, the properties of which advantageously simplify the data retrieval.
SQL has been designed under the assumption that RDBMS will store data horizontally, that is, in tables in which each row represents an object, also referred to as a tuple, and each column represents an attribute of the tuple. To obtain data as desired from the database, an SQL query is generated, optimized automatically by the RDBMS, and then executed, typically by joining two or more tables using join predicates, with the sought-after data being returned to the user.
As understood by the present invention, while conventional RDBMS that use horizontal tables are effective for many of their intended purposes, in certain emerging applications, notably e-commerce, the horizontal scheme can be inadequate for several reasons. For instance, an e-marketplace for electronics might require a database storing data on parts from 1000 manufacturers and distributors, and might contain two million parts classified into 2000 categories. New suppliers might join the marketplace on a regular basis, meaning that the database must constantly evolve. Unfortunately, horizontal databases are typically limited in the number of columns that are allowed, e.g., to 1012 columns, but as recognized herein e-marketplaces such as the one postulated above might require 5000 or more attributes spread across different categories and, hence 5000 or more columns. Even if a horizontal database were expanded to allow many more columns, most of the database fields would have nulls in them (thus rendering a “sparse” database), since most parts would have far fewer than the maximum number of possible attributes. As understood herein, in addition to increasing storage overhead, nulls increase the size of the database index and are preferentially highly sorted in many RDBMS tree indices, despite the fact that nulls convey no meaningful information. Such “sparsity” also degrades performance when data records are very wide in terms of the number of columns but only a few columns are required for a query. Further, using a horizontal system in the above-postulated scenario, the present invention recognizes that the parts table would require frequent and expensive altering to accommodate new parts and categories.
Accordingly, the present invention has understood that a different data representation is indicated, and more specifically, that a ternary (3-column) “vertical” representation is better suited to applications such as the e-commerce application discussed above. In a ternary vertical table, only a few columns, e.g., three, are used, with the first column representing an object identification and two more columns representing attribute names and attribute values, respectively, for the objects. Thus, the vertical table contains no nulls. Schema evolution becomes trivial, since as new objects (e.g., electronic parts) are added to the marketplace, new tuples are added to the vertical tables.
As recognized herein, however, storing data in vertical format raises complications. First, recall that SQL is designed for horizontal storage formats, and as a consequence writing SQL queries against vertical tables requires a level of expertise in SQL not possessed by most users. Even in the hands of an expert, tailoring an SQL query for a vertical table is cumbersome and error-prone. Additionally, current application development tools are designed for horizontal data formats; they simply will not function when used in connection with vertical tables.
The present invention has recognized the above-noted problems and provides solutions to one or more of them as disclosed below.