Advertisers tend to group prospective customers into broad demographic and geographic categories, possibly due to limitations in currently available market research methods with respect to determination of the effect of their advertisements. In addition, they use information gleaned from data mining to mass-market products to groups of prospective buyers. Unfortunately, the data searched during this data mining often contains low-validity information that is derived from small sample populations.
Due to these inherent data validity problems, statistics generated by such data mining may not accurately reflect a given market. That is, the statistics may not mean that all persons in a group will buy a product, but rather they imply that some person in a group may have a higher probability of buying the product than someone in another categorized group. For example, the data mining may show that more scuba equipment could be sold to 20–40 year-olds in Miami than to 50–80 year-olds in Kansas City.
Based on this data, advertisers carefully select the television shows, magazines, billboards, or other media on or in which their advertisements run. In the case of television, advertisers traditionally gravitate toward programs that gamer higher ratings for desired audiences and then select advertising slots within those shows. Advertisers purchase ratings data from market research organizations, who collect and analyze data on the viewing habits of individuals and then publish the results.
Examples of such research organizations include A. C. Nielson and Arbitron. Such companies typically monitor television-viewing habits of a relatively small number of viewers through telephone polls, specialized set-top monitoring “Nielson” boxes, or viewer diaries. The results of these surveys are then extrapolated to the population at large.
As can be expected, extrapolation of small-population data to the population at large is prone to many different limitations, with accuracy perhaps the most notable. For example, if there were only 200 persons over 65 years of age in a sample, their compiled viewing behaviors may be purported to be representative of the viewing behaviors of the 35 million people in the U.S. over 65 years of age.
Obviously, larger, more random sample populations are preferred over smaller sample populations. This is true because a larger sample population tends to reduce the impact of suspect behaviors. Such suspect behavior might include distorted or inaccurate information provided in written television viewing logs, or intentionally leaving the television “on” to a certain channel to insure higher ratings for a desired show even if the individual being sampled is not watching that show. If the behavior of even one of the 200 persons in the previous example was suspect, this may translate to errors in the predictions of approximately 175,000 people; if the sample population is increased to 50,000 people, an individual whose behavior was suspect would translate into prediction errors for only approximately 700 people. As advertisers continue to base their decisions on small-sample data, they are continuing to question whether their advertisements are reaching intended audiences.
While accuracy is certainly a big problem in the prior art, it is not the only problem. Another limitation is the specificity with which behaviors may be inferred as they pertain to specific demographic groups. For example, if only one of 200 sampled senior citizens is a single Asian with no dependents and has an annual income over $100,000, making an inference based on this more specific group is likely to be highly inaccurate; in many cases the behaviors of an entire demographic sub-group are attributed to the sampled behavior of only one person.
Another factor contributing to the inaccuracy of prior art is reliability. Invasive sampling methods such as those described above can cause many problems, including determining how much of the data can be trusted. Sampled individuals may not to be willing to disclose, for example, that they watch adult (e.g., X-rated) programming or other controversial programming. Without such information, all data generated becomes unreliable.
Still another problem is that even if the sample data can be trusted, the memory of a sampled individual or the ability of a sampled individual to adhere to documented guidelines may not be accurate or complete. If a given individual is asked what they watched last week, the likelihood that the response may be correct and specific is likely to be low. Often, low response rates or missing journal information are extrapolated according to previously collected data and rules determined therefrom. However, this extrapolation is built on data generated through the inherently faulty means described above.
The invasive sampling techniques used in the prior art also suffer from an inherent flaw. Since these methods are invasive and participation is optional, differences between the types of persons who may be willing to be sampled and those that are not willing to be sampled may not be accounted for in such techniques.
While the effects of some of the problems in the prior art can be limited by increasing the population sample size, population sample size increases are typically cost prohibitive. The increased costs are the result of several factors, including equipment purchase, installation, and repair; data collection and validation; and participant compensation.
However, even when equipment and other costs are taken out of consideration and larger samples are collected, such an increase in sampling size does not solve all of the problems in the prior art. For example, the prior art also faces a problem with data resolution. Most major media research organizations consider data in an all-or-nothing fashion. For example, if a set of channels was watched during some sampling interval, only the channel that was watched the most, or the one watched at the time of the sample, would be counted, and it would be recorded as having been watched for the entire sampling interval (typically anywhere from 30 seconds to several hours). Although some in the prior art have attempted to mitigate this effect by sampling more frequently, there is always the possibility that changes occurring between samples will be missed. Thus, the use of data collection methods employed by the prior art tends to result in the generation of misleading or inaccurate viewing data.
Data collected by media research organizations and inferences resulting therefrom face still another problem; one of substance. The fact that overlapping data is collected across different medium types (digital, written, verbal, etc.) makes the determination of common denominators difficult, and thus renders objective statistical mining impossible. Inferences drawn from such data may only be lateral in nature, and cannot be readily mined for trends. For example, while the data collected may support the conclusion that one show is more popular than another, the particular reason why one is more popular than the other cannot be extracted from this data. Such methods may be barely capable of supporting the most general popularity-type conclusions; any further analysis upon relationships of the conclusions is likely to be questionable at best, and accuracy may be lost each time more complicated, or deeper, inferences are drawn.
Unfortunately, there are many other problems with existing market research methodologies, such as the use of “Sweeps” or ratings periods, but most of these problems are at least partially statistically-correctable. However, the five major issues discussed—accuracy, group specificity, reliability, resolution, and data substance—are inherent to actively monitoring data within small samples and cannot be overcome by the prior art.