In many streaming transaction processing systems, it is necessary to execute queries that are driven by values in the streaming data against a conventional relational database. For example, in a typical streaming environment, a stream of records or objects that contain various data fields arrives continuously at the transaction processing system. In order to process a typical transaction, it is necessary to execute a conventional SQL query for each record in the stream where inputs to the SQL query are taken from selected field values in that record. Also, in many cases, the output values from the SQL query will be placed back into fields of that record so that the record can be further processed downstream of the SQL query lookup. For example, a billing processing system many receive a constant stream of records that are the result of metering equipment operating in the field. It is often necessary to lookup a rate or other value for each record based on the values in some of the fields in the record, such as type and quality of service, service duration, etc. Once the rate is determined it is written into a rate field in the record so that the record can be further processed as a unit.
One simple mechanism for executing such a query is to execute a parameterized SQL statement against the database as each new record is received. For example, assume that the transaction processing system receives a stream (stream) of records wherein each record (record) contains at least two fields, named field A and field B, each field containing values that are to be used in the query. Then, as discussed above, for each such record, it is necessary to retrieve a value from a field C that is stored in a relational database in a table where the retrieval is based on the values in fields A and B in the record. Such a query can be written as a join of two tables (tables t1 and t2) that are joined on the value of a third field D where the second table holds the parameter value for the field B. Such a query might be written in standard SQL language as:
SELECT t1.C FROM table1 t1 INNER JOIN table2 t2
                ON t1.D=t2.D        WHERE t1.A=@A AND t2.B=@B        
This SELECT statement can then be used in a simple program that processes records in a stream by reading the values of fields A and B from each record, performing the query for that record, reading a field (C) from the query result and subsequently writing the contents of field C to the original record. The program can be described by the following “pseudo code”:
FOR EACH record IN stream
                BEGIN                    READ A, B FROM record            EXECUTE query WITH @A=record.A AND @B=record.B            READ C FROM query            WRITE C TO record                        END        
The biggest problem with this approach is that it is not efficient, particularly if the overhead cost of executing the query is small relative to the overhead cost of the query itself (including the RPC overhead, in particular). Furthermore, any work done by the database in the process of executing the query for one record is not leveraged when processing the same query for subsequent records (important examples of such work include sorting that occurs in merge joins, hash tables that are built to perform hash joins and index traversals to read a page in the case of nested loop joins).