This specification relates to distributed processing of visualization data.
Consumer behavior analysis provides insight to the behavior and interests of consumers. One way of conducting such analysis is through the analysis of large amounts of low-level data in an online analytics system. For example, television systems facilitate the collection of high volumes of such low-level data. Television advertising systems can have data for viewership and consumer demographics, subscriber data for digital satellite and cable providers, viewership rates for content and time slots, anonymized raw event data, such as channel tune events, etc. Further adding to the possible amount of data is the granularity (e.g., individual interaction events) of many types of television viewership data, also referred to in this specification as “reporting data.” For example, television reporting data can span multiple areas of consumer behavior and be relevant in a whole range of applications, ranging from determining advertising effectiveness, interpreting social media behaviors, and driving other Internet applications.
Advertisers want to be able to view visual representations of television reporting data. Viewing the television reporting data and various representations of the television reporting data can permit trend identification and trend prediction, changes in consumer habits and demographics, etc. Such information can prove useful to advertisers. For example, such information can enable an advertiser to identify changes in viewer habits and alter the scheduling of his television advertisements accordingly.
However, generating a visualization of the television reporting data requires the television reporting data to be processed. Many visualization of television reporting data require processing much if not most of the television reporting data. For example, an extensive portion of the television reporting data would have to be processed to generate a visualization of consumer behavior for a specific demographic characteristic.
Generating such visualizations is made difficult by the sheer scale of the underlying data that must be processed. If the data is held in a data store (a relational database for example), it must be transferred to another machine in which it can be visualized. For example, where the visualization is generated by a single processing device, such as a single computer, the time to generate the visualization may be unacceptably long. The data is first transferred to the machine for processing. Transferring of large amounts of data in a timely fashion requires substantial bandwidth. Furthermore, even if the data can be transferred in a timely fashion, few machines other than special purpose supercomputers have enough memory to be able to efficiently process such vast amounts of data.