Analytical reporting systems are often used to measure and assess the effectiveness of a content distribution campaign (e.g., a campaign in which content items are distributed to a user device via a computer network) or other types of electronic interactions (e.g., website visits, electronic commerce, etc.). For example, in a content distribution campaign, an analytical reporting system can be used to generate reports for evaluating the effectiveness of distributed content items (e.g., impressions, hits, clicks, conversions, revenue, etc.). The reports may be generated by aggregating pre-defined keys from a set of raw data. In this context, a “report” is defined as a combination of keys and values. Typically, a data element needs to include all of the pre-defined keys and values of a particular report to be included in the report.
Dimension widening is a process by which keys and/or values are created or edited based on the values of other keys and a predefined mapping schema. For example, a customer may pre-define mappings from any keys (i.e., “condition keys”) to other keys (i.e., “action keys”) and other values (i.e., “action values”). Dimension widening can be used to automatically generate keys and/or values for a data entry such that the data element may be included in a report.
One challenge with dimension widening is the scale of the process. With a large number of dimension widening rules (e.g., mapping rules and conditions) and data elements to process, it is non-trivial to join such large data sets without introducing significant latency in the reporting process.
Further, it is challenging to apply dimension widening retroactively. Data is often preprocessed and/or preaggregated to reduce processing latency at the time of query (i.e., at the time a customer requests a report). However, if data has been preprocessed, applying dimension widening retroactively may cause a modification in the underlying raw data, thereby causing the preprocessed data to become outdated. Traditional solutions have included performing a massive reaggregation and preprocessing of the updated (e.g., dimension-widened) data or performing the dimension widening at query time. However, these solutions often require significant processing power and computing resources (e.g., CPU resources, disk IO resources, etc.), and can introduce significant query latency. Further, applying dimension widening to preprocessed data may result in a loss of information associated with the original raw data.