1. Field of Invention
The present invention relates generally to the field of content and/or data collection and use over one or more networks. More particularly, the present invention is related in one exemplary aspect to apparatus and methods for collecting data related to content usage, and validating the collected data.
2. Description of Related Technology
“Nielsen® Ratings” are a well known system of evaluating the viewing habits of cross sections of the population. When collecting Nielsen® audience data, companies use statistical techniques to develop a sample population which is a cross section of a larger national population. Theoretically, the viewing habits of the sample population will mirror the larger population. The companies then measure the populations viewing habits to identify, among other things, what programs the population is watching as well as the time and frequency at which those programs are watched. This information is then extrapolated to gain insight on the viewing habits of the larger population. Historically, the Nielsen® audience measurement system has been the primary source of audience measurement information in the television industry. The Nielsen® audience measurement system, therefore, affects various aspects of television including inter alia, advertising rates, schedules, viability of particular shows, etc., and has been also recently expanded from measuring an audience of program content to measuring an audience of advertising (i.e., Nielsen® audience ratings may be provided for advertisements themselves).
The Nielsen® audience measurement system collects data regarding audiences via either (i) by asking viewers of various demographics to keep a written record of the television shows they watch throughout the day and evening, or (ii) by using “set meters,” which are small devices connected to televisions in selected homes which electronically gather the viewing habits of the home and transmit the information nightly to the Nielsen® audience measurement system or a proxy entity over a connected phone line or other connection.
There are several disadvantages to the approached used by the Nielsen® audience measurement system. First, the sample of viewers selected may not be fairly representative of the population of viewers (or the subset of cable viewers) as a whole. For example, in a content distribution (e.g., cable or HFCu or satellite) network comprising four million cable viewers, a sample of any 100,000 viewers may exhibit different average viewing habits than the averages associated with the other 3,900,000 viewers who are not in the sample.
Second, static delivery makes it difficult to precisely target an audience that is known to be in the market. For example, suppose that the ideal target for a sports car advertisement is the set of all consumers who like and would be interested in buying sports cars. If all that is known from Nielsen® audience measurement data is that 10% of the sample group has watched the auto racing channel for over three hours in the last month, this may not perfectly correlate with set of consumers who like sports cars. This may be the case, for example, if there are some consumers who are in the market for sports cars but who never watch the auto racing channel, or if there are some viewers of the auto racing channel who have no interest in buying or owning sports cars. As such, patterns based on viewership data often imprecisely identify the desired audience.
Furthermore, the Nielsen® audience measurement system is disadvantageously program-specific. Program-specific audience data collection is problematic from the standpoint that this program-coupled approach is only as good as the underlying demographic correlation model. For example, assuming a demographic of 18-30 year old females typically tune in to American Idol each broadcast (e.g., Monday at 8:00 pm), this same demographic may not have any interest in watching the program immediately preceding or following American Idol, and hence may tune away (or delay tuning to that channel until the start of America Idol).
Another disability of the Nielsen® audience measurement system approach is that it tends to aggregate data or results for given premises (e.g., households) as opposed to providing data for specific users of that premises. For example, the switching activity associated with a given set top box for a family of five represents switching activity for each member of that family (including perhaps viewing of cartoons for a child, teen-related programs for a teenager, and adult-related content for one or more adults). Hence, the data obtained using Nielsen® audience measurement system techniques may be somewhat of an amalgam of the data for individual users, and various combinations thereof. Though certain so called “people meters” may be utilized for the precise identification of a viewer such as by age, sex, etc.
Moreover, although various user- or household-specific data collection mechanisms are known in the art, there is currently no way of guaranteeing a level of confidence in the integrity of the underlying data set, unless the collected data is validated manually by a network operator. However, the volume of data collected using these prior art approaches is simply too large for manual (e.g., human) validation. In other words, the size of the pool of data is too large for manual validation of every tuning event across all platforms. Other methods for collecting a smaller subset of data are also in use in the art; however this market-by-market approach does not collect data in real time and, because the sample size is so small, leaves most viewing unmeasured.
Therefore, there is a salient need for improved methods and apparatus which are capable of collecting and validating audience measurement or usage data without restricting the pool size or population. Such improved methods and apparatus would ideally be adapted to gather audience information in real-time or near-real time with associated viewership actions of actual viewers. Further, the data collection and validation methods and apparatus would advantageously be configured to collect and validate data relating to all types of content (including for example VOD consumption, interactive consumption, broadcast consumption, DVR usage, EPG interaction, telephone usage, internet usage, etc.).
These features would also be provided using substantially extant network infrastructure and components, and would be compatible with a number of different client device and delivery systems including both wired and wireless technologies.