Demographics play an important role in web advertising, web searching and generally the personalization of web applications. Applications like web search engines might adjust the ranking of search results based on the demographic attributes of a user like age, gender and occupation. Another important domain where demographics play an important role is online advertising. With the growth of web usage, online advertising is growing rapidly in recent years. In particular, contextual advertising is becoming popular. Behavior targeting using demographic attributes helps advertisers to target specific users with demographic relevant advertisements.
One approach to obtain demographics of a website is through panel studies similar to that of TV program rating. In this approach, panels with known demographic information are recruited and their browsing histories are recorded. These browsing histories of panels with different demographic attributes are used to compute demographics of websites. However, this approach requires impractically large sizes of panels to guarantee any reasonable coverage of websites. Additionally, if a site is not visited by any of the panels, then the demographics of the website cannot be estimated.
Another approach to obtain demographics of a particular website is by using information provided by that website's registered visitors or by asking some of its visitors to participate in online surveys. These techniques capture information only about the limited subset of visitors that have chosen to register and/or participate in the surveys. In addition, since not all segments of a website's visitors are equally likely to participate in the above activities, the resulting information is subjected to a sampling bias. Furthermore, since each individual can potentially register and/or take the surveys multiple times, the demographics obtained via this approach may not be accurate. Additionally, since the information provided by the visitors during registration or during their participation in surveys can potentially be used to describe and/or identify them, their use for any other purpose other than the one intended, represents a potential intrusion upon a user's expectation of privacy.
Another approach is to build a computational or statistical model to predict a website's demographic information. The existing approaches for building such models use data obtained by tracking users' browsing behavior across different websites, information about the content of the web-pages that the users visit, and information associated with the users' profile. The profile of a user (or a group of users) is often constructed by integrating various elements across different websites and contains information related to any data provided during registration, web-pages viewed, products purchased, advertisement clicked, etc. With the growing concern regarding privacy on the Internet, people are reluctant to share their personal data, and therefore, the applicability of existing approaches relying on such personal data can be limited.
Due to the combination of the above factors, and other factors, the methods in use today for characterizing the audience characteristics of websites are limited in their accuracy, their ability to cover a large number of websites with substantial audience traffic, and the failure to protect a user's right to information privacy.