Many Database Managements Systems (DBMS) are multi-node systems. Multi-node systems comprise multiple computing nodes, where each node may contain multiple processors, each running multiple concurrent processes. To fully utilize the computing power of a multi-node system, a DBMS may divide the task of executing a query into smaller subtasks, which may then be distributed to multiple processes running on one or more computing nodes. Because the subtasks are being performed in parallel by multiple processes which may be on multiple nodes, the execution of the query can be completed much faster than if the processing were performed by a single process.
A task that is divided into smaller subtasks that are executed by multiple processes is referred to herein as a distributed operation; each subtask may be referred to herein as a work granule. A DBMS typically executes a query as a distributed operation.
To execute a query as a distributed operation, the database server generates an execution plan and then executes the execution plan. An execution plan defines operations to be performed to execute a query and an order for performing the operations. Such operations are referred to herein as plan operations.
The execution plan divides many kinds of plan operations into work granules. When an execution plan is executed, each work granule is assigned to a “slave process” operating under control of a DBMS; some if not many of the work granules are performed in parallel by multiple slave processes.
Not all plan operations are divided into multiple work granules. One such plan operation is the generation of rows from an external table.
External Tables and Table Spaces
Database data may be stored in one or more files that belong to a table space. A table space is set of one or more files, or one or more portions of a file, or combination thereof, that is used to store data for a database, such as data for a table or an index. Files in a table space are formatted in a format native to a DBMS. A database dictionary of a DBMS defines table spaces and which table space files or portions thereof store data for which database objects.
Data for a database table may also be stored in external data sources, such as an external file. Data in an external data source is not formatted in a format native to a DBMS. A database table having data stored in an external data source, such as an external file, is referred to an external table.
A DBMS may define an external table in response to receiving a DDL (Data Definition Language) statement that describes the external table. The DDL statement specifies one or more columns for the external table (“DBMS table columns”) and a data source of the external table, such as a file. Defining an external table creates external table metadata, which is stored in a database dictionary but does not define a portion of a table space for storing data for the external table.
Parallizing Table Scans
Execution plans may include work granules that retrieve data for a table by accessing a table space file, and in the case of an external table, an external data source, such as an external file. To access data for a table in a table space file, a DBMS may form an execution plan that may include multiple work granules that each access a different portion of a table space file. For an external data source, an execution plan does not include a work granule for generating rows from the external data source that can be executed in parallel with another work granule for generating rows for the external data source. As a result, the degree of parallelism that can be achieved for a query that references or otherwise requires access to an external table is hampered.
Described herein are techniques for forming execution plans with multiple work granules that access an external data source for an external table.