A database is a collection of stored data that is logically related and that is accessible by one or more users. A popular type of database is the relational database management system (RDBMS), which includes relational tables made up of rows and columns. Each row represents an occurrence of an entity defined by a table, with an entity being a person, place, or thing about which the table contains information.
To extract data from, or to update, a relational table, queries according to a standard database-query language (e.g., Structured Query Language or SQL) are used. Examples of SQL statements include INSERT, SELECT, UPDATE, and DELETE. The SELECT statement is used to retrieve information from the database and to organize information for presentation to a user or to an application program. The SELECT statement can specify a join operation to join rows of multiple tables. A SELECT statement can also specify that a particular column (or attribute) of a table be aggregated by some specified function, e.g., SUM (to compute the total of a column), AVG (to compute the average value in a column), MIN (to find the smallest value in a column), MAX (to find the largest value in a column), COUNT (to count the number of values in a column), and so forth.
Typically, in response to a SELECT statement that specifies a join of multiple tables in addition to aggregation of one or more attributes of the tables, an optimizer generates a plan that performs the join of multiple tables first followed by the aggregation following the join. An optimizer selects a lowest cost execution or access plan (for a given query) from a plurality of possible plans. The cost is defined as the amount of time and resources needed to perform an execution of the plan.
In performing a join of multiple tables, the intermediate results are typically stored in a spool table. In some cases, the join of multiple tables (such as a product join) can generate a large amount of data. As a result, a spool space problem may be encountered if the spool table becomes too big. Consequently, database system performance may suffer.
In general, an improved method and apparatus of performing a join of multiple tables is described. For example, a method of performing a join of multiple tables comprises receiving a query containing an aggregate function and at least one of a where clause and a group-by clause to perform the join of the multiple tables. A number of working rows of at least one of the tables is reduced by performing a group by of the at least one table.
Other or alternative features will become apparent from the following description, from the drawings, and from the claims.