Databases are used to store and retrieve data. Data is retrieved through a data request called a query. The retrieved data can be selected, sorted, and organized based on the query. Sometimes the retrieved data has further computations or analytics applied beyond the query functions of the database language. Running these computations is often performed on a processor external to the database, such as a user, calling, or client device.
Running data-intensive analytic computations outside the database causes significant overhead in data access and transfer, which can be a major performance bottleneck in business intelligence applications. Pushing-down data-intensive analytics to the database layer for fast data access and reduced data transfer has some challenges. While a query processing engine is technically sophisticated, the query processing engine is primarily used for relational query evaluation. More general applications rely on User Defined Functions (UDFs). However, the existing UDF technology suffers from some limitations. First, tuple-wise pipelined UDF execution restricts the capability or efficiency in dealing with complex applications, and a tuple-set input is not supported. Second, the UDFs are coded in non-SQL language such as C, which either involves hard-to-follow Database Management System (DBMS) internal system calls for interacting with the query executor or sacrifices performance by converting DBMS defined relation objects to strings in passing arguments.
Existing database systems can only use scalar, aggregate and table input for UDFs, where a scalar or aggregate function cannot return a set. An existing database table UDF is limited to a single-tuple argument. Further, existing UDFs are typically executed during query execution in the tuple-wise pipeline of query processing, which may prohibit in-function batch and parallel processing.
An existing UDF is run in the query processing environment with a number of interactions with the query executor for parsing parameters, converting data, and switching memory contexts. Efficiently executed UDFs may be coded using DB engine internal data structures and system calls, but analytics users may have to deal with hard-to-follow system details. Coding efficiently executed UDFs may be too difficult and cumbersome for a database analytics user due to the complexity of the database system data structures and system development language. Converting existing UDF input data to strings from system internal formats can cause significant overhead in converting data and parsing parameters.