This invention relates generally database queries, and more particularly to correlated and multi-row subqueries in parallel databases.
A correlated subquery (CSQ) is a query that is nested within an outer query and references a value from the outer query. If the CSQ is executed on a single database, all of the data needed for the CSQ resides on the single database and is available to the CSQ. Thus, the execution is straightforward. However, in a distributed database, for example, a database having a massively parallel processing (MPP) or a shared-nothing architecture, the data is distributed across multiple different segments; and each segment has different data. A similar situation exists with multi-row subqueries where the subquery needs to combine the results from multiple rows of one or more tables that may be distributed across different segments. Thus, while in a conventional distributed database each segment may execute the same query plan, correlated and multi-row subqueries (together referred to herein as correlated subqueries or CSQ's) generally cannot be used with distributed databases because the different segments usually do not have either the necessary data to execute the CSQ, or a mechanism to conveniently locate the missing data that might be randomly distributed across multiple different segments. This has made certain CSQ subqueries on parallel distributed databases problematic and at times unworkable. This is particularly so with respect to MPP and shared nothing databases.
It is desirable to address the foregoing and other problems by providing distributed parallel databases with the ability to use correlated and multi-row subqueries in a similar manner to the way in which such CSOs can be used on a single database system. It is to these ends that the present invention is directed.