Commonly, software implements logic for functions that are executed serially, such that the output of a first function is used as input in a second function. The technique of using the output of a first function as the input to a second function is called pipelining.
A common example of pipelining is illustrated using Unix shell commands. For example, a user may wish to decompress a file, search within the file for a particular string, generate some statistics based on the particular string, and return the results. From the command line, a user could execute one program at a time and store the intermediate results from each preceding command to non-volatile storage for the next program to use as input. Alternatively, a user could use shell operators to pipeline the intermediate results from a first program to a second program, such that the final result alone is stored to non-volatile storage, e.g.:tar xfzO foo.tar.gz|grep “I am a happy bee”|wc>bar.txt
In the example above, tar decompress and extracts data from a file. The results are piped through standard output to grep, which scans for all lines that contain the phrase “I am a happy bee.” The lines with the phrase “I am a happy bee” are piped to wc, which generates some statistics on the lines. The statistics are then stored to non-volatile storage in a file named bar.txt. Using Unix shell pipeline operators non-volatile storage may be accessed by tar alone, and the final results alone may be stored to non-volatile storage by the Unix shell. The intermediate programs, grep and we are never required to access or store data to non-volatile storage. Even when using pipelining as shown above, however, the result of each program is saved in memory for the next program to access.
Pipelining can also be performed within a software program. For example, the output of a software-implemented function F1 may be fed as input into a second software-implemented function F2, both of which may be implemented in the same software program P. When serially executing functions in an application, the result of each function in the pipeline is typically saved in memory for the next function to access. That is, the output of F1 is stored to storage locations in volatile memory, and read from those locations in volatile memory when provided as input to F2.
Similarly, database engines pipeline functions according to a query plan. Specifically, in response to receiving a query, a “planner” within a database engine may generate a plan to accomplish the operations specified in the query. Such plans often involve feeding the results produced by one function into another function. When executing the query plan, each function may be executed serially, and the intermediate results generated by each function are saved to memory. Subsequent functions in the query plan can access the saved intermediate results from preceding functions, generate new results, and save the new results in memory for further subsequent functions to access. Saving and accessing intermediate results incurs a heavy performance penalty, requires more power, consumes memory bandwidth, and increases the memory footprint.
For example, in response to a query, a planner may determine that data from a particular table must be decompressed, the decompressed data must be scanned to identify data that matches criteria specified in the query, and the matching data thus identified must be transformed to produce the results required by the query. The transformed results are then to be returned to the requestor.
To execute such a plan, a first function accesses the compressed data in memory, decompresses the data, and stores the decompressed data back into memory. A second function accesses the decompressed data stored in memory, scans the decompressed data for specific data matching the query parameters, and stores the matching data back into memory. A third function accesses the matching data in memory, transforms the matching data, and stores the transformed matching data back into memory. Finally the transformed matching data is returned to the user or application that issued the query. In this example, the intermediate results (the decompressed data and the matching but not yet compressed data) were written and accessed in memory, which incurred a heavy performance penalty.
The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.