A database is a structured collection of data and metadata that is stored on one or more storage devices, such as a set of hard disks. The data within a database may be logically organized according to a variety of data models, depending on the implementation. For example, in the relational model, data is typically organized into a set of tables, where each table comprises a set of row and columns. In most cases, each row represents a distinct object and each column represents a distinct attribute. However, other data models may also be used to organize the data.
A database management system (DBMS) is software that controls access to data in a database. The DBMS is configured to receive and process a variety of database commands, often referred to as queries. In many implementations, the DBMS supports queries that conform to a structured query language (SQL). SQL is a standardized query language for managing data in a relational DBMS (RDBMS).
SQL includes two distinct sets of commands: Data Definition Language (DDL) for managing and indexing data structures in the database; and Data Manipulation Language (DML) for accessing and manipulating data residing within the data structures. DDL is typically used to create, alter, and delete database objects, such as tables, indexes, views, and constraints, whereas DML is typically used to add, query, update, and delete data in existing database objects.
When the DBMS receives a query, such as a SQL expression, the DBMS evaluates the query to obtain a query result. Query evaluation includes two stages: query compilation and query execution. During query compilation, the DBMS parses the SQL expression and generates a query execution plan. The query execution plan specifies an ordered set of steps, frequently represented as a tree of query operators, used to execute the query. The query operators at each step are associated with one or more expressions that represent computations or other actions that will be performed upon query execution. Example SQL operators include, without limitation, table scans, joins, table queues, group-by operations, and bloom filters. Once the query execution plan is generated, the query is executed according to the plan.
In order to evaluate a SQL expression, the DBMS relies upon the internal structures and organization of the data within the database. For example, the SQL expression may identify which data to access from the database based on where it resides in a particular table. Data that resides externally (herein referred to as “foreign data”) from the database is typically not constrained by the structure and organization defined by the internal database metadata. For instance, the foreign data may not be organized into a tabular format by the external data source and may be organized according to a different structure and format defined by the external data source. Therefore, SQL queries typically cannot be used to access and modify data external to the database.
One approach to enable SQL queries to analyze data residing outside the database involves loading the foreign data into the database. A database loader is an application that extracts data from the external source, transforms the extracted data into a format suitable for loading into a target database table, and populates the database table with the extracted data. Thus, data from the external source is internalized by the database and converted to a format that conforms to database's internal organization. This approach allows the DBMS to execute queries on data from external data sources after the data has been loaded into the database. However, the loading process involves a high processing and storage overhead, especially where the external source comprises large quantities of data that are continuously updated.
Another approach for evaluating foreign data is to offload query evaluation to an external source of the foreign data. For example, when a database query directed to foreign data is received, the DBMS may send the database query to the external source for evaluation. The external source may then evaluate the query and return the query result to the DBMS. This approach increases the processing overhead on the external data source and requires the external data source to be able to process queries received from the DBMS.
The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.