The present invention is relevant to delivery of information in any kind of information infrastructure. The invention is illustrated herein using a communications network type of information infrastructure which can deliver video programming.
In a typical network in which advertisements or other video programming are delivered, such as a conventional cable television network, the advertisements are delivered to many customers indiscriminately. This is disadvantageous for the customers because some customers are subjected to advertisements in which they have no interest. It is also disadvantageous to the advertisers because the advertisers must pay to deliver the advertisement to a large audience of customers including, the customers they desire to reach and the customers who have no interest in the advertisement.
In a preferred advertisement strategy, the advertisers target a selected group of the customers who are more likely to be interested in the advertisements and deliver the advertisements to only the selected group of customers. Until recently, such targeted advertisement was not possible in broadcast communications because the communications network in which the advertisements were delivered did not permit delivery of advertisements to only specified customers. However, recent advances in communications networks have made such selective delivery of broadcasted advertisements possible. FIG. 1 depicts one such illustrative improved prior art communications network 10. Illustratively, the communications network 10 may be any kind of network such as a telephone network, a computer network, a local area network (LAN), a wide area network (WAN), a cable television network, etc. As shown, the network 10 interconnects sources 21 and 22, such as advertisers, to destinations 31, 32, 33 and 34, such as customers. The communications network 10 can transport video, audio and other data from a source, e.g., the source 21, to only specific ones of the destinations 31-34, e.g., the destinations 31 and 33. For example, the video, audio and data may be transmitted as a bitstream which is organized into packets. Each packet contains a header portion which includes at least one identifier, for a destination 31, 32, 33 and/or 34, that is unique over the network 10 (e.g., the identifiers for the destinations 31 and 33). These identifiers are referred to as network addresses. The packet is routed by the communications network 10 only to those destinations 31 and 33 as specified by the network addresses contained in the header of the packet.
In order to implement the targeted advertising strategy, the advertisers must be able to determine the customers to which the advertisements are targeted. Advantageously, demographic data regarding the customers is compiled into a database. A database is defined as a collection of data items, organized according to a data model, and accessed via queries. The invention herein is illustrated using a relational database model. A relational database or relation may be organized into a two dimensional table containing rows and columns of information. Each column of the relation corresponds to a particular attribute and has a domain which comprises the data values of that attribute. Each row of a relation, which includes one value from each attribute, is known as a record or tuple.
FIG. 2 shows an exemplary relational database (prior art) Y. The relation Y of FIG. 2 contains data pertaining to a population group. The relation Y has six attributes or columns 2-1, 2-2, 2-3, 2-4, 2-5 and 2-6, for storing, respectively, name, age, weight, height, social security number and telephone extension data values of the population. The database also has twelve records or tuples 3-1, 3-2, 3-3, . . . ,3-12. Each tuple 3-1, 3-2, 3-3, . . . , 3-12 has one data value from each attribute. For instance, the tuple 3-10 has the name attribute value "lee", the age attribute value 40, the weight attribute value 171, the height attribute value 180, the social security number attribute value 999-98-7654 and the telephone extension attribute value 0123.
To identify the targeted customers for an advertisement, a profile containing queries is executed against the database. A query is used to identify tuples which meet criteria of interest from the database. A query usually includes a predicate which specifies the criteria of interest. For instance, the following query executed against the relation Y:
Select from A where Y.Age&lt;15 OR Y.Age&gt;50 includes the predicate "where Y.Age&lt;15 OR Y.Age&gt;50" which specifies that only those tuples having an Age attribute value less than 15 or greater than 50 are to be identified. The advertiser can thus construct a profile for execution against the relational database to identify the targeted audience of customers.
The problem with implementing such a targeted advertising scheme is that customers may be reluctant to wholesale disclose the necessary demographic data for constructing the relational database. In particular, customers may be concerned about:
(1) direct release of raw information about an individual customer,
(2) deduction of non-released information of an individual customer from information regarding the identity of the customers who match a given profile, and
(3) deduction of non-released information of a specific individual customer from knowledge of a series of profiles, together with the number of individual customers that received or would receive the advertisements corresponding to those profiles.
The first two threats to privacy can be overcome by modifying the communications network in a fashion similar as has been done for protecting anonymity of customers who retrieve video in Hardt-Kornacki & Yacobi, Securing End-User Privacy During information Filtering, PROC. OF THE CONF. ON HIGH PERF. INFO. FILTERING, 1991. Such a modified network is shown in FIG. 3. As shown, the communications network 50 interconnects sources (advertisers) 61, 62 and destinations (customers) 71, 72, 73 and 74 similar to the network 10 of FIG. 1. However, a filter station 80 and name translator station 90 are also provided which are connected to the communications network 50. Illustratively, the filter station 80 has a memory 82 for maintaining the database of customer demographic data. Furthermore, the filter station 80 has a processor 84 which can execute queries against the demographics database stored in the memory 82. Each source, such as the source 62, has a server 64 and a memory 66. The server 64 of the source 62 transmits one or more profiles (containing queries for identifying particular target audiences) to the processor 84 of the filter station 80. The processor 84 executes each profile query against the relational database stored in the memory 82 to retrieve the aliases assigned to each customer identified by each query. The processor 84 then transmits the corresponding aliases for each profile back to the server 64 of the source 62 which may be stored in the memory 66 for later use.
When the advertiser-source 62 desires to transmit the advertisement to the targeted customer destinations, e.g., the destinations 72 and 74, the server 64 transmits the advertisement and the aliases into the network 50. The network 50 delivers the advertisement and aliases to the processor 92 of the name translator station 90. The processor 92 then translates the aliases to their corresponding network addresses, for example, using information stored in a memory 94. The processor 92 of the name translator station 90 then transmits the advertisement to the customer destinations 72, 74 using the network addresses.
In the modified communications system, the customer-destination, e.g., the destination 72, knows its own demographic information. The advertiser-source, e.g., the source 62, knows its advertisement, its profiles and how many customers will receive the advertisement. The advertiser only receives aliases for the individual customers 71-74. Thus, the advertiser does not posses the raw demographic information and is not given information for identifying the customers 71-74 (such as the network addresses). The filter station 80 contains information regarding the entire demographics database and receives the profiles submitted by the advertisers. The name translator station 90 contains only the translations of aliases to network addresses and receives the aliases and advertisements. The network 50 only receives the advertisement and network addresses of the destinations.
Despite such protections, the advertiser still obtains some results of the execution of the queries of the profiles against the demographics database, such as the number of customers which match each profile. This may be sufficient information to deduce personal information of the customer. For example, suppose the advertiser knows the identities of 100 customers in the zip code 07090 who collect stamps. Furthermore, suppose the advertiser submits a profile for targeting all customers in zip code 07090 who collect stamps and who have an annual income of $50,000-$100,000. If 100 aliases are returned to the advertiser, then the advertiser successfully deduces the salary range of all 100 stamp collectors.
The above threat, wherein query results can lead to deducing private information, is referred to as a "tracker attack." Stated more generally, a "tracker" is a special case of a linear system which involves solving the equation: EQU HX=Q (1)
where: H is a matrix which represents tuples that satisfy corresponding queries, where each column j represents a different tuple, each row i represents a different query and where each matrix element h.sub.ij =1 if the j.sup.th tuple satisfies the predicate C.sub.i of the i.sup.th query and 0 otherwise,
C is a vector representing the predicates used in each i.sup.th query, PA1 X is a vector representing the (unknown) tuples which satisfy the predicates PA1 C (to be solved by equation (1)), and PA1 Q is a vector of counts or other results returned by each i.sup.th query containing elements q.sub.i where each q.sub.i is the sum (or other result returned from the i.sup.th query) over an attribute of the tuples retrieved by the i.sup.th query.
The prior art has proposed some solutions for protecting statistical relational databases from tracker attacks. Dobkin, Jones & Lipton, Secure Databases: Protection Against User inference, ACM TRANS. ON DATABASE SYS., vol. 4, no. 1, March, 1979, p.97-106 proposes to restrict query set overlap, i.e., to prevent submission of multiple similar query sets, to prevent this kind of attack. However, such a control is difficult to implement because a history of all previously submitted query sets must be maintained and compared against the most recent submitted query. A "cell-suppression" technique has also been proposed wherein statistics, or other query execution results, that may reveal sensitive information are never released. However, cell-suppression techniques are best used for queries which produce two and three dimensional tables but not for arbitrary queries which are of concern in implementing targeted advertising.
Random noise techniques have been proposed wherein a random number is subtracted from the results returned by a query. This solution is not satisfactory for implementing targeted advertising because the result presented to the advertiser would then be inherently inaccurate. In an alternative scheme proposed in Warner, Randomized Response: A Survey Technique for Eliminating Evasive Answer Bias, 60 J. OF THE AM. STAT. ASSOC. p.63-69 (1965), individuals may enter erroneous values into the relational database a certain percentage of the time. The problem with this strategy is that the advertisers would then target advertisements to the wrong audience a certain percentage of the time. Denning, Secure Statistical Databases Under Random Sample Queries, ACM TRANS. ON DATABASE SYS., vol. 5, no. 3, September, 1980, p.291-315 discloses a noise technique wherein the queries are applied to only random subsets of the tuples rather than all of the tuples in the relational database. In addition to the specific disadvantages mentioned above, one or more of the above-described noise addition techniques may be subverted by a variety of noise removal methods.
Yu & Chin, A Study on the Protection of Statistical Databases, PROC. ACM SIGMOD INT'L CONF. ON THE MGMT. OF DATA, p.169-181 (1977) and Chin & Ozsoyoglu, Security in Partitioned Dynamic Statistical Databases, PROC. IEEE COMPSAC CONF., p. 594-601 (1979) disclose methods for partitioning the relational database into disjoint partitions.
All of the above methods were developed primarily for statistical databases and do not have properties which enable the implementation of targeted advertising. In particular, the above methods do not provide precise identification of tuples which satisfy queries or do not provide an accurate count (or other returned query result) of such retrieved tuples. However, both of these properties are important in targeted advertising. First, it is important to accurately target all customers whose demographic data matches a submitted profile. Second, it is vital to obtain an accurate count of the identified customers for purposes of billing the advertiser and for purposes of deciding whether or not the profile identified a desirable number of customers for receiving the advertisement.
It is therefore an object of the present invention to overcome the disadvantages of the prior art. It is another object of the present invention to provide a targeted advertising method which preserves the privacy of confidential information of the customer. In particular, it is an object of the present invention to reduce the advertisers' ability to deduce confidential information about the customers from the results of one or more profile queries executed against a demographics relational database.