Log joining systems may attempt to join one source of secondary events with another source of primary events. In one example, primary events may include search queries, while secondary events may include user clicks on advertisements. Every primary event is uniquely identified by a key, and every secondary event is associated with a key for a corresponding primary event. The purpose of log joining is to locate the corresponding primary event for every secondary event based on the primary key.
In a non-continuous system, the joining may be performed in sequential batches. This can easily be achieved by a series of massive map-reduces having a mapper component and a reducer component. Both secondary and primary events may be fed into the mapper component, and the mapper may emit secondary events keyed on the common primary event ID. The reducer, in turn, can take the joining decision and output the joined logs. However, such a configuration may not scale well in a larger system having a continuous stream of events.