1. Field of the Invention
The present invention generally relates to data analysis. More specifically, the present invention relates to determining demographics based on user interaction.
2. Description of the Related Art
Many modern websites and media outlets have a social or interactive aspect incorporated in their design. Around the world, billions of people consume video, news, and interact with games and each of those platforms allows users to interact with the content based on what is displayed on the screen. For example, more than a half billion users generate large amounts of data each day on U.S. social networks such as Twitter® and Facebook®. Other types of media websites (e.g., news sites) also include a social or interactive aspect where readers can comment, respond, or otherwise interact with the content on the site. Such content may include not only the originally published article, photo, video, etc., but also content posted by other users related to the original publication. For example, a news site may publish an article. In response, a user may provide feedback or questions in the comment section of that article. Responsive to the first user, other users may post their own feedback, answers to the question, or additional content to supplement the discussion.
While such user-generated data adds value to the content, the demographic breakdowns of the users interacting with the content are not well understood. Presently available ways to determining demographic information may involve, for example, determining a user's IP address. A user's IP Address may allow for geo-location of the user at a particular longitude and latitude. Based on the location, census data can be accessed, allowing for deduction as to likely demographic. Such a process may be error-prone, however, because the location of IP addresses is determined based on registration information, which may not necessarily be the location of the user. In addition, census information may be years out of date, since polling does not occur every years and demographic data may become out-of-date.
Another method of determining demographic involves a user's email address. An email address may be used as a marker to retrieve the user's social graph. A user may have an account on a social network, for example, and have provided demographic information (e.g., in a profile). Reliance on email is also highly error-prone, as users may not wish to provide email addresses and may therefore fail to provide one. In some instances, users may use a fake or back-up email address. Further, in some cases, having an email address may not be sufficient to access the user's profile (e.g., due to the user electing certain privacy options).
Cookies are often employed as a way to determine user demographics. A cookie may be downloaded to a user's computer, for example. If the user visits another site and provides demographics data, the demographic profile of that particular visitor may be aggregated. For example, if user A logs into site X and then goes to site Y and logs in and enters in their age, income, and education background, the subsequent visit to site X could provide the owner of site X a demographic picture of that visitor that was not available. While accurate data may be gleaned over time, it relies on the user to voluntarily provide relevant information. There may also be difficulties running the cookies on certain websites due to privacy and security concerns.
Some entities use registration and profiles to track information on their users. A website may require a user to register and fill out a profile in order to access and view content. Alternatively, a website may encourage users to register and fill out profiles by offering free access to desired content or some other incentive. Either way, demographic information may be determined based on the profiles provided by the users who log into the system to access the content. For example, a website can account for demographics based on the profiles of logged-in users who access a video posted on the website. This approach is limited, however, because not all websites require users to register and provide profile information. In some cases, users may be turned off by the extra steps required to register and fill out even a basic profile. Even already-registered users may not want to take the steps of logging in. For example, a user may not access content on a site often and may consequently forget their log-in name and password.
There is therefore a need for a robust method for determining accurate and timely demographic information.