1. Field of the Invention
The present invention relates to determining advertising (“ad”) and content visibility and other indications of attention to or engagement with advertising or content both within servers, and on network connections. In particular, the present invention relates to computer systems and methods for measuring user behavior on web-connected devices and dynamically controlling sample rates and data flow in a networked measurement system by dynamic determination of statistical significance or statistical characteristics (e.g. threshold). Consumer and media behaviors are sampled to gather information, which is transmitted to a downstream analytics system.
2. Description of the Related Art
The Internet and other types of on-line communication have become increasingly popular to the point where they now compete with traditional media such as print media and broadcast media for the attention of users. Due to the extra large amount of web pages available for users to view worldwide, online content creation and publication have become a huge business.
Yet, the data flows created by millions of browsers and display advertisements being sampled simultaneously are significant, costly, and push the capacity of current technology to its limits.
It is therefore advantageous to have a way to reduce or regulate data flows in parts of the systems of the current technology. Although on the surface, one solution may be to reduce the sample rates for samples of user and media behaviors to reduce the amounts of data flow to the analytic servers, and within the analytic servers. However, reducing sample rates also results in statistical inaccuracy, thereby compromising the overall integrity of the systems and methods involved with the data flow.
It would therefore be advantageous to limit aggregate data flows from distributed browsers and within servers by limiting sample rates in a way that maintains sufficient statistical significance, thereby not impacting the integrity of the systems and methods. Yet, this goal has been difficult to accomplish because raw data is gathered from thousands, if not millions of different locations concurrently. Yet the results of testing for significance are normally based on the aggregate number of samples for a particular element that is to be sampled.