1. Field of Invention
The present invention relates generally to the field of content and/or data delivery over a content distribution network. More particularly, the present invention is related in one exemplary aspect to apparatus and methods for assuring privacy of collected data related to usage of content delivered to various devices in a content distribution network.
2. Description of Related Technology
Content delivery and distribution networks may have a large number of disparate users. In many situations, it is desirable that the preferences and behaviors of these disparate users be known to the operators of the network (as well as the content sources which generate the content for distribution over the network). Moreover, in cases where the users are subscribers or customers of the delivery network (e.g., as in a cable television, satellite, Hybrid Fiber over Copper (HFCu), or similar network), revenue, profit, and subscriber retention/addition are also critical concerns, since these factors effectively keep the network operator (and to some degree content producers) commercially viable. Accordingly, methods and apparatus are established to generate data records of a subscriber's interaction with content including their preferences, behaviors, etc. Further, billing systems and other support systems may be utilized within such networks in order to further take into account subscription level, access privileges, account status (e.g., payments, delinquency), requests for changes in service, and other related functions associated with the collected records.
The data relating to behaviors, preferences etc. of users in a network may be used, for example, to generate ratings of particular programs (or portions thereof) and statistics across a subsection of subscribers, geographic areas, programs, etc. “Nielsen Ratings” are a well known system of evaluating the viewing habits of cross-sections of the population. When collecting Nielsen ratings, companies use statistical techniques to sample a portion of the population to project a representative national population. Theoretically, the viewing habits of the sample population will substantially mirror the larger population. The companies then measure the populations viewing habits to identify, among other things, what programs the population is watching, as well as the time and frequency at which those programs are watched. This information is then extrapolated to gain insight on the viewing habits of the larger population to determine media consumption. Historically, the Nielsen system has been the primary source of audience measurement information in the television industry. The Nielsen system, therefore, affects various aspects of television including inter alia, advertising rates, schedules, viability of particular shows, etc. Other implementations for the collection of data relating to user interaction with content, however, have also been developed.
The Cable Privacy Act of 1984, and, more generally, privacy and consumer advocacy groups, require (either through specific mandate or threatened action) that without an explicit subscriber opt-in, data that could conceivably be used to trace back to subscriber personally identifiable information be strictly protected, and shared only in such a way as to mitigate the chance that such data could be used to derive subscriber personally identifiable information.
It is appreciated that the collection of such data may not be secure, regardless of any steps taken (such as at the data collection entity) to ensure the anonymity of the subscriber. Problems arise for example, when the sample of data is so small that the particular subscriber(s) to whom the data relates can be determined; i.e., where a party may determine the identity of subscribers via “derivative association”. For example, suppose a company has only one customer within a particular zip code. If that company shares “anonymous” information with a third party which includes the zip code, the only additional piece of information that the third party would have to know is who in that zip code is a customer of the company. This information may readily be obtained, such as through buying data from data aggregators such as Experian™. With only these two pieces of information, the third party may now uniquely identify the household referred to in the “anonymous” data that the company provided.
The aforementioned logic is also readily extendable to other situations where, although a greater amount of seemingly “anonymous” data is provided to third parties, those parties, through joining the “anonymous” data with information obtained from additional data sources, can derive personally identifiable data.
Hence, what are needed are methods and apparatus for automatically “collapsing” or otherwise adjusting anonymous data sets generated for third parties in such a way as to minimize the probability that, through correlation to other data sources, a third party could associate the provided data to personally identifiable information in order to determine a unique identity of the user to which the data relates.