Present invention embodiments relate to query processing in data processing systems, and more specifically, to execution of queries using application domain logic coresiding with the network interface on data retrieved from a network source.
Modern databases have data distributed among multiple computer machines. For example, one database engine of an on-line data store may be running on a database server machine but the on-line data store may have data or need access to data stored in multiple locations, some of which may be remote and/or even offline. As databases grow larger, loading necessary data from the remote and offline locations is an increasingly important part of instantiating a data set. Relatedly, data sets are growing larger, and thus making near line and/or offline data stores necessary in large databases. In current technology, metadata about this remote data is used to determine whether the remote data is relevant before loading data into an on-line data store. This metadata analysis and loading is done by the database engine.
Culling and loading data from remote locations however, carries a significant overhead for the database engine. Conventional database systems typically retrieve data from the remote locations, store the retrieved data into local storage (such as local hard drives) and then query the data in the local memory to generate the result set. Multiple acceleration techniques have been used by modern database systems to accelerate the process. One technique uses a specialized network interface device to move the received remote data to the local storage (e.g., hard drives and/or flash PCI device) and another technique uses a co-processor to pre-process results to reduce data passed back to the computer processor. In those techniques, the computer processor (e.g., the CPU) plays the central role in setting up and monitoring the data retrieval process, acknowledging data receipt, setting up the co-processor, passing data to the co-processor and using the result data from the co-processor to generate a result set to return to database applications. The whole process is inefficient and consumes a lot of computing resources, especially when the remote data has to be ingested into the local database engine (e.g., loading into the local database for the first time).