In a high-volume online service environment, there is a need to identify, track, and process events that occur in the course of providing various services or features. Often, these events are generated in response to user interaction with a service. As it is typical for millions of such events to be generated each day, these events are automatically generated, collected, and processed by computer systems. These events are typically processed periodically to generate reports for end users useful for assessing the status of and managing the provided online services.
Data associated with these events is divided into two categories: dimension data and event data. An example of dimension data is an advertiser record, in which multiple items (or “dimensions”) of data are stored, such as an advertiser identifier, name, address, and contact information. This example of dimension data comprises information about an advertiser that initiates an event. Another example is a webpage record, which stores items such as URL, last modification date, and identification of the party that last modified the webpage. This example of dimension data comprises information about a resource, such as a webpage, that is utilized as part of a transaction that generates an event. Although a request or transaction from a user may trigger an event that generates event data, the event and its event data are separate from the request or transaction that triggers the event. An example of event data is a record, often in a log file, which indicates (1) a time at which (2) a particular advertiser visited (3) a particular webpage. Generally, a piece of dimensional data is commonly associated with multiple events; for example, multiple events corresponding to multiple visits by different parties to a single webpage. Additionally, an event may be associated with multiple pieces of dimension data, as in the above example of event data, which may be associated with a dimension data record relating to the advertiser, and another dimension data record relating to the visited webpage. To reduce the amount of storage required, event data is usually normalized: a process by which information unique to an individual event is stored separately from corresponding dimension data. A normalized event data record will make reference to a corresponding dimension data record, typically by use of a unique identifier assigned to the dimension data record (e.g., an advertiser identifier or a webpage identifier).
In a large customer service operation, there are many discrete sources of event and dimensional data. Each of these sources is typically developed separately, and they each have different internal formats for data corresponding to events. For example, a web server may generate a log file regarding information it has served. Conventionally, a system for aggregating events occurring on these sources would periodically retrieve event data from the sources, and use particularized software both for retrieving event data and transforming the retrieved event data into reports. Conventionally, for each source for event and dimension data a dedicated pipeline has been created which collects and forwards event and dimension data, translates the event and dimension data, and then generates reports based on the event and dimension data. Collecting such data typically demands highly customized code for interaction with the applications and/or products which provide such data. Translation and report generation based on the collected data is typically customized based on the format of the collected data and the reports to be generated. Thus, creating a pipeline for each source of event and dimension data demands specialized, and accordingly limited, developer resources, which limits the rate at which new sources can be incorporated into a reporting infrastructure and the rate at which new reports which make use of data provided by these sources can be produced.