Publish-subscribe protocols have been employed for the distribution of streaming data. A common publish-subscribe protocol is an RSS (Rich Site Summary) feed. An RSS feed is a family of web feed formats used to publish frequently updated works in a standardized format. The data transmitted in an RSS feed may include blog entries, news headlines, audio, and video. RSS feeds or documents include full or summarized text, plus metadata such as publishing dates and authorship. RSS feeds can be read using software called an “RSS reader”, “feed reader”, or “aggregator”, which can be web-based, desktop-based, or mobile-device-based. The user subscribes to a feed by entering into the reader a URI of the feed or by clicking a feed icon in a web browser that initiates the subscription process. The RSS reader checks the user's subscribed feeds regularly for new work, downloads any updates that it finds, and provides a user interface to monitor and read the feeds.
A user can subscribe to a topic, such as finance, and receive in an email daily or monthly or weekly messages only in finance. The user receives RSS feeds only in areas (associated with the topic) to which they subscribe. The user does not receive of all documents that are published by one particular publisher.
An RSS feed is a specific instance of the more general class of publish-subscribe protocols which employ a publish-subscribe architectural pattern. Publish-subscribe is a messaging pattern where senders of messages, called publishers, do not program the messages to be sent directly to specific receivers, called subscribers. Instead, published messages are characterized into classes, without knowledge of what, if any, subscribers there may be. Similarly, subscribers express interest in one or more classes, and only receive messages that are of interest, without knowledge of what, if any, publishers there are.
More particularly, in a publish-subscribe architecture, a subscriber can specify interests-cats, dogs, the stock market, finance, education, etc. A publisher may periodically publish items (e.g. documents) that may include attached tags, known as topics. These topics are included in a dictionary of topics. The dictionary is shared with subscribers. The subscribers may find their interests in the dictionary. A dictionary is a collection of all topics that each item may or may not relate to, and is known to all participants (e.g., subscribers and publishers). Interests are elements from the dictionary associated with a subscriber. Topics are elements from the dictionary associated with an item. Items may be digital documents in any format. If one of the interests of the subscriber is determined by the publisher to be equal to one of the topics of the next item to be published by the server, then the subscriber receives the item once it is published by the publisher. If no interests match any topics in the dictionary of the publisher, then the subscriber does not receive the item to be published.
A problem often encountered in circumstance where publish-subscribe protocols are employed is privacy violations—e.g., privacy with respect to transmitted data and/or the interests and/or identity of the subscribers. In the examples below, the clients are not malicious and clouding but are considered honest but curious.
For example, in a typical RSS feed, a subscriber reveals their interests, e.g., finance, and the publisher may view the interests; thus, the publisher may obtain some information about the personal choices of the subscriber. As a result, the privacy of the subscriber may be violated. Other instances of violations of privacy are more sensitive. For example, from the government's perspective, there may be sensitive databases that reveal sensitive material and topics, e.g., an agency may publish documents. One agency is interested in a certain document; another agency may be interested in another document. In certain circumstances, without privacy protections in place, an intruder in one agency may determine the interests of another agency. In another example, one or more subscribers is interested in the Facebook stock. As a result, a publisher or an external intruder may learn that a number of subscribers are suddenly interested in Facebook stock. Thus, privacy is an important issue with respect to transmission of documents employing publish-subscribe protocols.
Currently deployed publish-subscribe methods and systems target a very limited set of security or privacy requirements (if at all). For example, centralized architectures generally employ a server that is trusted and that further protects against outsiders and client misbehavior through authentication and transport layer security (e.g., SSL/TLS. See Tim Dierks, Eric Rescorla, “The Transport Layer Security (TLS) Protocol Version 1.2.,” Internet Engineering Task Force, Request for Comments 5246, August 2008). Similarly, distributed implementations commonly operate in the “fortress model” in which participants are trusted and outsiders are not trusted (See Yair Amir, Cristina Nita-Rotaru, Jonathan Stanton, Gene Tsudik, “Secure Spread: An Integrated Architecture for Secure Group Communication,” IEEE Transactions on Dependable and Secure Computing (TDSC), 2(3): 248-261, (2005)).
The work of Castro and Liskov (See Miguel Castro and Barbara Liskov, “Practical Byzantine Fault Tolerance and Proactive Recovery,” ACM Trans, Comput. Syst., 20(4): 398-461 (2002)) even as extended to achieve perform well when under attack as described in Yair Amir, Brian Coan, Jonathan Kirsch, John Lane, “Byzantine Replication Under Aattack,” In Proc. of the 38th IEEE International Conference on Dependable Systems and Networks (DSN08), 2008: 197-206 and in Allen Clement, Edmund Wong, Lorenzo Alvisi, Mike Dahlin, Mirco Marchetti, “Making Byzantine Fault Tolerant Systems Tolerate Byzantine Faults,” In Proc. of the 6th USENIX Symposium on Networked Systems Design and Implementation, 2009: 153-168, provide functionality in the presence of compromised components, but do not attempt to provide client privacy. A well-studied area in cryptography research, known as Secure Multi-Party Computation (or Secure Function Evaluation (see Andrew Chi-Chih Yao, “Theory and Applications of Trapdoor Functions (Extended Abstract),” In Proc. of IEEE FOCS 1982: 80-91 and Oded Goldreich, Silvio Micali, Avi Wigderson, “How to Play any Mental Game or A Completeness Theorem for Protocols with Honest Majority,” In Proc. of ACM STOC 1987: 218-229) address the general problem of two or more parties, each with its own input, jointly and privately computing a function over the inputs. This general approach provides more capability than is needed to implement private publish-subscribe, and is thus too expensive. Basic and well-studied problems in cryptography research, addressing secure computation of specific functions, include Private Information Retrieval (where a client is interested in obtaining one out of a server's many strings without revealing which one) (see Benny Chor, Eyal Kushilevitz, Oded Goldreich, Madhu Sudan, “Private Information Retrieval,” In J. ACM 45(6): 965-981 (1998) and Eyal Kushilevitz, Rafail Ostrovsky, “Replication is NOT Needed: SINGLE Database, Computationally-Private Information Retrieval,” In Proc. of IEEE FOCS 1997: 364-373), Oblivious Transfer (here, the server transfers the client's desired string without knowing which one or revealing all other ones (see Michael O. Rabin, “How to Exchange Secrets with Oblivious Transfer,” Technical Report TR-81, Aiken Computation Lab, Harvard University, 1981), Private Set Intersection (see, e.g., Michael J. Freedman, Kobbi Nissim, Benny Pinkas, “Efficient Private Matching and Set Intersection,” In Proc. of EUROCRYPT 2004: 1-19 (in this method, two parties hold a set of values and at the end of the protocol one of them can compute the intersection of the two sets), and Conditional Oblivious Transfer in Giovanni Di Crescenzo, Rafail Ostrovsky, Sivaramakrishnan Rajagopalan, “Conditional Oblivious Transfer and Timed-Release Encryption,” In Proc. of EUROCRYPT 1999: 74-89 (a variant of oblivious transfer such that a message is sent from a sender to a receiver if and only if a predicate over the two parties' inputs is true, and the sender does not know the predicate value).
Other security and cryptography research has directly considered the problem of designing secure and/or private publish-subscribe protocols. This research has fallen short as having either a different participant model (i.e., they typically consider publishers as active participants or entirely distributed models with no servers or third parties), having a different set of capabilities and functionalities (i.e., they typically ignore protocol dynamics like subscription updates or only target sophisticated filtering rules for content publication), or having a different set of security and/or privacy requirements (i.e., they often require privacy against intermediate routing nodes or privacy only against the server, or they target more demanding requirements which ultimately result in not efficient protocols).
The work described in Costin Raiciu, David S. Rosenblum, “Enabling Confidentiality in Content-Based Publish/Subscribe Infrastructures,” In Proc. of SecureComm 2006: 1-11 (based on ideas on searchable encryption from Dawn Song, David Wagner, and Adrian Perrig, “Practical Techniques for Searches on Encrypted Data,” In Proc. of the IEEE Symposium on Security and Privacy, 2000), provides a very efficient publish-subscribe protocol in a restricted participant model (a 1-server, 1-client model), but which only supports privacy against a server and not against clients and does not support subscription updates by clients and related privacy requirements.
Accordingly, what would be desirable, but has not yet been provided, is a system and method for providing security and privacy guarantees in a publish-subscribe protocol in the presence of honest-but-curious participants.