1. Field of the Invention
The present invention relates generally to information processing systems, and, more specifically, to computer-implemented databases and database management systems (DBMS) that facilitate the storage of data in and the retrieval of data from a database.
2. Description of Related Art
Early simple database structures comprised a single file or table of multiple rows, or tuples, and columns. This format was particularly helpful when each row was unique; that is, no data was redundant. However, databases quickly became more complex. They began to include information in which each row was not unique. For example, consider a database containing information about authors and the books they write. If an author had written multiple books, each row in a simple database schema would include the author's name and one book's identification. So if an author wrote “n” books, the table would include “n” rows; and the author's name would appear in each of the “n” rows.
This repetition of the same value, such as the author's name, in multiple rows is characterized as “redundancy.” Redundancy introduces certain problems. For example, storing redundant data consumes memory unnecessarily, and at the time memory was limited and expensive. That issue has become minimized over time. However, a problem that has not been minimized involves updates that change data and the need to maintain consistent data. If a database includes the authors and their addresses and one author's address changes, it is necessary to change that address in the record for each book that author wrote. If the update process is interrupted for any reason, some of that author's addresses might be changed and others might not. The resulting data would be inconsistent. Maintaining data consistency is a necessity with current databases.
Relational database management systems (RDBMS or “relational models”) then were developed. These systems still serve as the basis for state of the art database management systems (DBMS). Introduced in the 1970s, the relational model pioneered data independence, supported by a formal algebra and an associated declarative query language known as Structured Query Language (“SQL”). In this language an “SQL query” or “query” is the primary tool for interacting with the data in the various tables.
Generally an RDBMS system stores data according to a relational storage schema in tables. Each table is stored as a sequential set of rows in data storage facilities, such as disks, main-memory and other memory. Many systems also implement indexes as additional data structures to enable fast random access to a specific row or rows. An index encodes a column or several columns (the key of the index) in a way that facilitates navigation to rows based on a key value. Each index entails an added cost of constructing and maintaining the index data structure if the data changes. The user, however, is presented with a non-redundant view of the data as tables and is generally unaware of this and other underlying redundancy.
Two other concepts also emerged during this development, namely: “normalization” and “relationships”. “Normalization” minimizes redundancy by dividing the data into multiple tables. In the foregoing example, normalizing the single redundant author-book table produces individual author and book tables. The author table includes the information about each author one time; the book table, information about each book one time.
FIG. 1 discloses, in logical form, a simple normalized database 30 that records information about each author and book and information about each of a plurality of customers, their orders and the books associated with each order. The organization of this information will include analysis of the data information to define a “normalized” database with individual tables. One or more individual tables may be related. For example and for purposes of explanation assume, a database architect analyzes this data in FIG. 1 and arbitrarily defines three groups of related tables, such as an Author group 31, a Customer group 32 and a State group 33. The author group 31 contains all the information about authors and their books and includes an Author table 34 and a Book table 35. The customer group 32 relates to customers and their orders and includes a Customer table 36, an Order table 37 and an Item table 40. The State group 33 relates to state information and includes a State table 41. As will be apparent, the State group could comprise multiple tables, and the Item table 40 could be a member of the Author group 31.
Although FIG. 1 depicts relationships among these individual tables, the relationships are not explicitly defined. Rather, primary and foreign keys provide the information from which relationships can be defined. There are many field naming conventions, particularly with respect to naming primary and foreign keys. This discussion assumes that the prefix “fk” is added to the primary key name in one table to form the foreign key name in another related table. In this specific example, the Author table 34 includes an AUTHORID primary key field; the Book table 35, an fkAUTHORID foreign key field. Such a relationship is generally described as a one-to-many relationship because the same foreign key in multiple rows of the Book table 35 are related to each author. That is, a single, or “one,” author is linked to one or more, e.g., “many”, books the author has written. A link 42 in FIG. 1 represents at a logical level the relationship that the AUTHORID and fkAUTHORID fields define.
The Customer group 32 in FIG. 1 includes similar one-to-many relationships defined by link 43 and link 46. Two links 44 and 45 define relationships between tables in different groups. The link 44 links a table in the Author group 31 to a table the Customer group 32. Specifically, the Item table 40 has fkORDERID and fkBOOKID foreign keys. The fkORDERID foreign key links one item to one order. The fkBOOKID foreign key links one item to one book. The link 45 defines the relationship between the Customer table 36 and the State table 41 with the fkSTATEID field in the Customer table 36 linking to the STATEID field in the State table 41.
While FIG. 1 depicts each table with some representative fields, also called “attributes,” FIG. 2 depicts the tables in a datasheet view with rows, columns and representative data. Each table has a primary key; some have a foreign key. More specifically, the Author table 34 contains the AUTHORID primary key plus attributes for an author's last name, first name, birth date and optional contact information. The Book table 35 the BOOKID primary key, the fkAUTHORID foreign key and attributes for the title, list price (LIST), publication date (PUBLISHED) and description. It is assumed for purposes of this discussion that each book is written by only one author. The organization of the remaining tables and attributes will be apparent to those of ordinary skill in the art.
An SQL query requests information from a database and creates a “resultset” of the requested information. A “resultset” is an object that holds a copy of data in memory and is disconnected from the data source. The most common SQL query or command retrieves data and is performed with a declarative SELECT keyword and several optional keywords and clauses, including FROM, WHERE, GROUP BY, HAVING, and ORDER BY clauses. The response to any query must retrieve each identified table individually from different file locations and then match the corresponding rows from the different tables according to the relationship.
The contents of the WHERE clause explicitly define each relationship that is relevant to the request and provides “de-normalization” information that serves as the basis for a “join” operation. The WHERE clause specifically defines the tables to be joined and the fields or attributes that provide the relationship. For example, a WHERE clause that is to define the relationship between the Author and Book tables 34 and 35 will identify the Author table 34 and the AUTHORID primary key and the Book table 35 and the fkAUTHORID foreign key. That is, the execution of that WHERE clause joins the two tables using the AUTHORID primary key and the fkAUTHORID foreign key to reproduce the original data where each row would include the author's information and one book identification. A query response must process one join operation for each relationship that needs to be resolved.
It is difficult to execute join operations efficiently. Such inefficiencies are tolerable in relatively small databases, such as shown in FIGS. 1 and 2. However, real databases are becoming increasingly complex. A customer entity may include many levels of information including orders, items, addresses, phones, interaction history and links to other family members. Normalization places each level of information in a dedicated table. The relationships that exist when a database has many tables with a complex relationship increase the number of joins. The cumulative effect of join operation inefficiencies in databases of a size and complexity that are orders of magnitude greater than databases such as shown in FIGS. 1 and 2 can not be tolerated.
Normally a database is stored and processed in data storage facilities comprising disks, main-memory, cache and other memory individually or in combination. Each such memory has latency characteristics. Generally “latency” is the time that elapses between a request for data and the beginning of the actual data transfer. Disk delays, for example have three components, namely: seek time, rotational latency, and transfer rate. Seek time is a measure of the time required for the disk heads to move to the disk cylinder being accessed. Rotational latency is a measure of the time required for a specific disk block to reach under the disk head, and transfer rate (“bandwidth”) is a measure of the rate at which date passes under the disk head. For sequential accesses, the determination of seek and rotation times need only occur for the first data block.
Over time disk and memory bandwidth improvements have enabled very efficient sequential accesses, for both read and write operations. However, disk and memory latency times have not improved correspondingly, making random access very inefficient. The dramatic improvement in bandwidth compared with latencies means that sequential access is becoming much “cheaper” than random accesses. Moreover, the performance advantage of sequential over random access is increasing exponentially over time. Access to individual tables may require extensive random disk and memory accesses, particularly during join operations. Accessing all parts of a customer entity, each residing in a separate row in disparate tables, requires a large number of random accesses. This further increases the difficulty in efficiently executing join operations, as well as other query operations.
SQL queries are used in two groups of database interaction applications, namely: OLTP (OnLine Transaction Processing) and OLAP (OnLine Analytical Processing) originally called DSS (Decision Support System) applications. OLTP applications process online transactions. Information associated with such transactions can be efficiently added to or retrieved from single table, such as any one of those shown in FIG. 1 using conventional storage schema. However, as entities span more and more tables the added cost of join operations and random accesses makes such queries increasingly inefficient.
For small databases with simple relationships, OLAP application requests for information can be processed with reasonable efficiency. In complex databases, OLAP applications search, retrieve and aggregate data from only a few columns within tables containing many rows. Each table must be fully scanned for any dimension that is not indexed or any aggregate that is not pre-calculated. Consequently it is unlikely that any analytical query to a relatively complex relational database will produce a result dataset in any reasonable time.
For example, FIG. 3A depicts an OLTP query 50 that seeks to list the title and sale price for each book ordered by a given customer. A SELECT clause 51 defines a final result dataset for the first and last names of the selected customer from the Customer table 36, the book title from the Book table 35 and the sale price from the Item table 40 with one row for each item.
A FROM clause 52 identifies each table to be accessed during the execution of the SQL query 50. In this specific example, the tables are the Customer, Order, Item and Book tables 36, 37, 40 and 35, respectively.
A WHERE clause 53 identifies the rows to be retrieved from each table and establishes the joins. In this example, a first condition 54 requires the CUSTOMERID value in the Customer table 36 to be equal to the value in the fkCUSTOMERID field in the Order table 37. Conditions 55 and 56 establish similar relationships and joins between the ORDERID field in the Order table 37 and the value in the fkORDERID field in the Item table 40 and between the BOOKID value in the Book table 35 and the fkBOOKID field in the Item table 40. A final condition 57 establishes the criteria that the CUSTOMERID field in the Customer table 36 be equal to “20”.
As known and shown logically in FIG. 4, a data processing system 58 responds to the receipt of an SQL query 50 to produce a final resultset by parsing, optimizing and executing the SQL query. A query parser 59 uses the information in a data dictionary 60 to convert each SQL query into a series of SQL classes. A query optimizer 61 generates a query plan in response to the SQL query and information from the query parser 59 and the data dictionary 60. An execution unit 62 processes the query plan and information from the data dictionary 60 and a data store 63 to produce a resultset.
Generally speaking a data dictionary comprises meta data with definitions and representations of data elements. Within the context of a DBMS, a data dictionary is a set of tables and views and holds information concerning the definition of data elements, user names, rows and privileges, schema objects, integrity constraints, stored procedures and triggers, general database structure and space allocation information. In this embodiment, the data dictionary 60 in FIG. 4 includes a set 64 of table entries. Each table entry in the set 64 includes a set 65 of attribute or field definitions. Each definition in the set 65 contains a set 66 of properties for each attribute or field.
A change log 67 facilitates the durability of all data insert, delete and update operations. As known, a change log, such as the change log 67, records any changes in any class of non-volatile storage devices such as disk or optical drives.
A database system may operate in any processing system and use any number of computers, processes, threads and the like. Each application may be replicated or instantiated multiple times to satisfy performance or other considerations. Moreover, different systems may process SQL queries in different ways. FIG. 3B depicts one such query plan 70 for the SQL query 50 in FIG. 3A; and FIGS. 3C and 3D depict the interim and final result sets that will be produced when the execution unit 62 processes the query plan 70. Referring to FIGS. 3B and 3C together, initially step 71 will define a first interim result set 71A comprising the record in the Customer table 36 whose CUSTOMERID value is “20”. This first resultset, in this particular example, will contain only one record from the Customer table 36. As will be apparent, such a selection could be made in response to name information, in this example customer Adam Apple, to obtain the CUSTOMERID value.
Step 72 represents a first join operation wherein the Order table 37 will be scanned to identify any orders corresponding to the CUSTOMERID=20. Thus, step 72 produces a second interim resultset 72A that will contain each record from Order table 37 that relates to Adam Apple, along with the customer data.
A second join operation 73 uses the ORDERID primary key value in the interim resultset 72A and the fkORDERID value in Item table 40 to identify those items which Adam Apple has placed an order. A third resultset 73A will be produced. A third join operation 74 will combine those rows to produce a resultset in step 73 with the corresponding books from the Book table 35 through the fkBOOKID foreign key value in the resultset of step 73A and the BOOKID primary key value in the Book table 35 to produce a fourth resultset 74A. A projection operation 75 will convert this fourth resultset 74A into a final resultset 75A which includes the customer's first and last names and the selling price and title for each book Adam Apple purchased as defined by the SQL query 50.
Although the desired outcome for the SQL query 50 in FIG. 3A is relatively simple, the system must access four different tables and process three join operations to re-assemble the customer entity and extract corresponding book titles. Additionally each part of the customer entity resides in a different table such that every order and every item must be accessed individually by means of inefficient random access.
As an example of a more analytical query, FIG. 5A depicts an SQL query 80 for obtaining an average cash discount on the books by state assuming that the book price and the item price differ. The SQL query 80 includes a SELECT statement 81 that identifies the two elements, namely: the state code and a calculated cash discount value. The discount value is established as the cash difference between the list book price recorded by the LIST attribute of the Book table 35 of FIG. 2 and the actual sale price recorded in by SALE attribute in the Item table 40. In this particular SQL query a FROM statement 82 identifies the Customer table 36, the Order table 37, the Book table 35 and the State table 41. The WHERE statement 83 establishes four conditions thereby to establish the appropriate relationships in the same manner as described with respect to FIG. 3A. The GROUPBY statement 84 directs the process to multiple columns to be grouped. This enables aggregate functions to be performed on multiple columns with one command. The final resultset produces a two-column report. One column contains a state name; the other, an average cash discount for all sales to customers in that state.
FIG. 5B depicts a process analogous to that shown in FIG. 3B for processing the SQL query 80 in FIG. 5A. In this example, the WHERE clause defines four joins that are necessary to produce the desired result. More specifically, a query plan 90 initially will identify the Customer table 36. A first join operation 91 will respond to a first element of the WHERE clause 83 in FIG. 5A to join the Customer table 36 and the Order table 37 on the CustomerID value. A second join operation 92 will join the data set produced by the operation 91 with the Item table 40. A third join operation 93 then will produce a resultset based upon the output of the join operation 92 and the corresponding values from the Book table 35. A final join operation 94 will combine the resultset produced the join operation 93 and the data in the State table 41. A projection operation 95 will generate the values for the state code, the book price and the item price. An aggregation operation 96 will generate the average of the discount price on the basis of the state codes and produce a resultset at 97. As will be apparent, once the query plan 90 is constructed, the execution unit 62 will process the query plan 90 and generates the specified resultsets.
Although the desired outcome for the SQL query 80 in FIG. 5A operates on only three columns and seems relatively simple, the system must process four join operations because the required data resides in five individual tables. Even a relatively efficient implementation of join operations can require significant system resources. As the size of the tables increases, join implementation also suffers. When large numbers of join operations must be handled, processing delays increase, sometimes to unacceptable levels. Notwithstanding these issues, most databases continue to store data in normalized tables that closely follow the relational logical data model, as multiple related tables consisting of sequential rows and attribute columns. The logical relationships between the tables do not generally affect the way the data is actually stored in the tables. The database system must execute joins to establish a relationship for all the tables and require a large number of random accesses to match related parts.
Another approach for storing the data in various tables such as those shown in FIGS. 1 and 2 recognizes the foregoing problems and proposes to store database tables by vertically positioning all columns of the table. This optimizes the data for retrieval when processing OLAP applications. For example, U.S. Pat. No. 5,794,229 (1998) to French discloses such a storage schema. This system uses a conventional logical relational database management system table such as shown in FIG. 1. Rather, rather than store the data in a conventional row structure, as shown in FIG. 2 of this specification, the system only stores data in columns by attribute name. Each column comprises a plurality of cells (i.e., a column value for a particular record) which are arranged on a data page in a contiguous fashion. In the context of FIG. 1 of this disclosure, each of the tables 34 through 37, 40 and 41 would be stored in a column configuration.
In response to a query that system analyzes only those columns of data which are of interest and uses an optimizer to select the join order of the tables. The system does not retrieve row-based data pages consisting of information which is largely not of interest to the query. The retrieved column-based pages will contain mostly, if not completely, the information of interest to the query. This allows greater block I/O transfers and consequently faster execution of OLAP type queries. The execution of each join operation may be improved, as only the data in the primary key and foreign key columns needs to be retrieved and joined. However, all columns need to be joined. Thus, the number of join operations grows according to the number of attributes, rather then the number of tables. Column-based systems work well only for low projectivity and low selectivity queries. Put another way, these systems are adapted for queries where only a few attributes are projected, and where the data from a large percentage of the rows is used. Executing a query that accesses many attributes per table would be highly inefficient, as it would be necessary to execute many joins for all the attributes in same table. Therefore, processing of even the simplest OLTP query will be highly inefficient. For example retrieving just one row from the customer table 36 in FIGS. 1 and 2 requires fetching seven attribute values, each located in different column.
U.S. Pat. No. 7,024,414 (2006) to Sah et al. also discloses the storage of column data. This system parses table data into columns of values. Each column is formatted into a data stream for transfer to a storage device as a continuous strip of data. Both single-column and multiple-column storage structures are disclosed. Each column is stored as a data stream as a continuous strip of compressed data without regard to a page size for the storage device.
These approaches offer benefits by optimizing storage for query processing with respect to usage patterns, such as OLAP applications that access a few complete columns of many rows. However, they handle OLTP queries inefficiently. Moreover, in order to assemble a single complete entity, each attribute must be retrieved from a separate binary table by way of random access. So systems that use horizontal storage schema handle OLAP queries inefficiently while systems that use vertical storage schema handle OLTP queries inefficiently.
An article titled “Weaving Relations for Cache Performance” (Ailamaki et al., Proc. 27th VLDB Conf., Rome, 2001) describes still another alternative data storage schema. In a conventional N-ary Storage Model (NSM) rows of a data table are spread across pages in memory. The described Partition Attributes Across (PAX) system converts the data in each page into vertical partitions to improve cache utilization and performance. However, the row formatted data is lost; there is only one copy of the data. To reconstruct a row one needs to perform “mini-joins” among vertical partitions within the page containing the row. Mini-joins incur minimal costs because it does not have to look beyond one page. A query response requires the system to scan each page, and fetch, or retrieve, the vertical partition for each attribute the query defines. The rows or records that satisfy the conditions are identified and reconstructed. While this article describes performance enhancements due to this variation, any such system still must process all the joins that would be processed with a database stored according to the NSM schema. Additionally processing a single attribute still requires fetching and processing the complete set of pages in which rows of the data table exist.
One proposed approach to lower the cost of join operations is to answer queries using previously defined materialized views, rather than using the base tables. A materialized view represents a pre-computed query result that is stored as a concrete table, and may be updated from the original base tables from time to time. U.S. Pat. No. 6,510,422 (2003) to Galindo-Legaria et al., U.S. Pat. No. 6,850,933 (2005) to Larson et al., and U.S. Pat. No. 6,199,063 (2001) to Colby et al. disclose similar approaches to the use of materialized views for answering queries, also known as the “view-utilization” problem. That is, the approach is to reduce the number of required join operations to answer a given query by processing materialized views already stored in a joined format. U.S. Pat. No. 6,356,889 (2002) to Lohman et al., and U.S. Pat. No. 7,191,169 (2007) to Tao, present several approaches to the related “view-selection” problem where a set of views to be materialized is chosen such that the cost of evaluating a given set of queries is minimized and such that the views remain within a pre-defined storage space constraint. The materialized views approach can cut join operation costs for these joins that are materialized. However, this comes at a great cost of determining, maintaining and storing the set of views and limits its application to only the most resource intensive queries. Furthermore, because materialized views store data in a highly un-normalized form, they tend to consume a large amount of storage space making the problem even more acute.
Another article titled “A Case for Fractured Mirrors” (Ramamurthy et al., Proc. 28th VLDB Conf., Hong Kong, 2002) describes a database mirroring storage schema in which data is stored in accordance with both the NSM model and a Decomposition Storage Model (DSM). Two disks are mirrored. In one approach a first disk stores the primary copy of data in the NSM format; the second disk stores the primary copy of data according to the DSM format. In another approach fragments of the data are stored across the two physical mirrors to achieve better load balancing between the two copies. For example, if the system includes data stored as NSM and DSM models, a first mirrored disk might store a first fragment NSM0 from the NSM model and a second fragment DSM1 from the DSM model. Conversely, the second mirrored disk would store a first fragment DSM0 from the DSM model and a second fragment NSM1 from the NSM model. While this system provides a duplicate copy of each storage fragment, its primary goal is to incorporate a mirroring schema that replaces other RAID mirroring schemas, and where different schemas are used for different purposes (one for OLAP type loads and the other for OLTP loads). This system must also process all the joins that would be processed with a database stored according to the NSM schema.
What is needed is a database management system that produces high throughput for complex databases and that optimizes the response to both OLTP and OLAP queries and can handle increasingly complex requests for information, increasingly complex data models, and increasingly large quantities of data. What is also needed is a database management system that minimizes the processing of joins, maximizes sequential access and minimizes random access operations in memory, particularly disk memory, while maintaining two complete copies of the data.