Databases are computerized information storage and retrieval systems. A relational database management system (RDBMS) is a computer database management system that uses relational techniques for storing and retrieving data. Relational databases store data using structures that include one or more tables of rows and columns, which may be interrelated. A RDBMS typically uses Structured Query Language (SQL) for data definition, data management, and data access and retrieval. A database schema is used to specify how data is stored in a collection of tables and how the tables are related to one another. Using database query languages, such as SQL, data stored in a computer database may be retrieved, updated, and deleted. Updates may include creating new tables or dropping old tables, inserting, modifying, or deleting rows in an existing table, and copying tables or rows within the database.
One of the goals of a RDBMS is to optimize the performance of queries for access and manipulation of data stored in the database. Given a target environment, an optimal query plan is selected, with the optimal query plan being the one with the lowest cost (e.g., response time) as determined by an optimizer. The response time is the amount of time it takes to complete the execution of a query on a given system.
There are several types of database systems available, including parallel data processing systems. A parallel data processing system may include RDBMS with enhancements that allow the data in the tables to be shared among the nodes (partitions) of massively parallel processing (MPP) system. A node can be an independent processor on an MPP machine, or a separate machine belonging to a clustered hardware environment. The RDBMS may perform join or subquery processing at the database partition in which the data is stored. This can have significant performance advantages. In MPP systems, the processing costs for performing non-collocated joins can become undesirable. As is understood, a join comprises a SQL operation that combines records from two or more tables. Efficient collocated joins are critical to the performance of parallel data processing systems.