When we advertise we want to know the size of the market we are advertising to. We want to know how many potential customers our advertisement will reach and we use this number to estimate sales and control the cost of advertising. Since the price of advertisement charged by content providers (such as newspapers, TV networks, radio stations, Internet sites, etc.) usually depends on the reach knowing the reachable audience size is extremely important for determining the cost effectiveness of advertisement and estimating return on investment—ROI—as a ratio of the projected sales revenue to the cost of advertising.
While for the traditional-media advertising (i.e. TV, radio, print, etc.) methods for estimating the reach are well developed, the same methods cannot be applied for the new-media advertising (such as advertising on Internet). The traditional-media advertising uses the number of subscribers as a fair approximation of the reach; radio advertising relies on manual call-out marketing to estimate the audience size. Internet advertisement in general is not delivered to subscribers while expensive and tedious call-out marketing is almost universally replaced with computerized unique visitor estimation techniques based on analysis of site access logs.
The two most popular unique visitor estimation techniques for Internet advertisement include the count of unique network addresses (such as IP addresses) mined from site access logs [1, 2, 3] or the count of unique “cookies” [4] also mined from site access logs [5].
The problem with the first method is that network addresses change over time; therefore the same visitor may be assigned a different network address upon a return visit and thus be misidentified as a new visitor. Furthermore network addresses are also reused; therefore two distinct visitors may share the same network address on subsequent visits and thus be misidentified as one. No formal research in the area has been conducted until now [6] and the obtained results sharply contradict currently accepted notion in the field that the ratio of unique network addresses to unique visitors is constant and is on the order of 1. The research conducted by the author [6] has revealed that the ratio of unique network addresses to unique visitors is not constant and grows linearly with sampling time and with visitation frequency. In other words if an Internet site reports 1,000,000 unique visitors per month basing this number on the count of unique network addresses the actual number of unique visitors may be 30 times less (e.g. ˜30,000) if majority of users—the core audience—visit the site twice daily.
The potential inaccuracy of the network address counts as a measure of unique visitors has been realized before and a new method of unique visitor identification based on “cookies” has been developed [5]. “Cookie” is a persistent and unique token of information that is submitted (typically by Web Browser) to Internet site in order to identify a user on a return visit. When a new user comes in a new unique cookie value is generated to identify the user on a return visit. Currently cookie-tracking methods are considered the most reliable and amount to industry standard in unique visitor identification. Google Analytics, Yahoo, SpyLog and other online content rating providers rely on this method for calculating the unique visitor numbers. Potential problems that negatively impact the accuracy of the cookie-tracking method include cookie clearing by users (both periodic and sporadic, including deletion of cookies by software such as Antivirus or disk cleaning programs) and explosive proliferation of Internet access points and devices such as smart phones, PDAs, pocket PCs, game consoles, notebook PCs, etc. Since cookies are specific to each device, a person that uses 10 such devices will appear as 10 unique visitors to a cookie-tracking system. Currently the impact of cookie clearing and Internet access device proliferation is vastly neglected and unique cookie counts are nevertheless used as a direct measure of unique visitors. The research conducted by the author [6] revealed that cookies are subject to the same “explosion” mechanism as network addresses: the ratio of unique cookie counts to unique visitors is not constant and grows linearly with sampling time and the growth factor increases with the increase of visitation frequency. The author's findings on the cookie clearing impact (which is only one of contributing factors of inaccuracy) corroborate similar data recently reported by comScore [7].
Thus cookies are about just as inaccurate in estimating unique visitors as unique network addresses. This is the new and unrealized fact in the industry that has a direct impact on Internet advertising as currently reported unique visitor/core audience size numbers tend to overestimate the true audience size by a large factor (7-30, depending on the visitation frequency and the sampling period). Also, cookies are not supported by all Internet access hardware/software devices and generally cannot be used with Internet audio/video streams thus further limiting the area of cookie-tracking applicability.
To remedy the problem the author has invented a new, novel and highly unobvious method for estimating unique visitors discussed below.