Location-based services (LBSs) have been gaining tremendous popularity over the recent years, in particular since the emergence of mobile social networking services (mSNSs). Social networking giants such as Facebook and Twitter are all turning their services into mobile, along with specialized vendors like Foursquare, Gowalla and Loopt. Besides, major mobile carriers also strive to provide more value-added services to their subscribers, among which the most thrilling applications are LBSs such as location-aware advertisement (“check-in deals”) and nearby-friend reminders.
A typical LBS business model consists of a location registry (typically a social network or a mobile carrier who accepts user location updates or “check-ins”), a service provider (SP, typically a third party application developed on the social network) that offers LBS applications based on user locations, and a client (typically a mobile user) who requests the service. In this model, the third-party application is authorized to access user locations but it is not trustworthy regarding its service returned to the client. For example in FIG. 1, an SP offers location-based restaurant browsing which tells the client not only the nearby restaurants, but also the numbers of diners as an indication of their popularity. Each of these numbers can be retrieved by the SP through a spatial range query on a user location dataset specified by the client. However, the client may not trust these numbers as the SP has the motive to manipulate them in favor of “sponsored restaurants”. As another example in public services, the government may outsource the online traffic monitoring service to third-party vendors. For market profits, however, they may prioritize the services by sending updated and accurate congestion reports to paid users while sending delayed or inaccurate ones to free users. These trustworthy issues are extremely important as more day-to-day businesses and public services are turning mobile and location-based. It would be soon indispensable for service providers to deliver their services in an authenticable manner, in which the correctness of service results—whether each result is genuine (soundness) and whether any result is missing (completeness)—can be verified by the client.
In the literature, such as reported in F. Li, G. Kollios, and L Reyzin. Dynamic authenticated index structures for outsourced databases. In Proc. SIGMOD, pages 121-132, 2006, H. Pang, A. Jain, K. Ramamritham, and K.-L Tan. Verifying completeness of relational query results in data publishing. In SIGMOD, pages 407-418, 2005, H. Pang and K.-L Tan. Authenticating query results in edge computing. In Proc. ICDE, 2004, Y. Yang, S. Papadopoulos, D. Papadias, and G. Kollios. Spatial outsourcing for location-based services. In Proc. ICDE, pages 1082-1091, 2008 and Y. Yang, S. Papadopoulos, D. Papadias, and G. Kollios. Authenticated indexing for outsourced spatial databases. The VLDB Journal, 18(3):631-648, 2009, there are a lot of works on the authentication of query results. In these works, the data owner (i.e., the location registry) publishes not only data (i.e., user locations) to the third-party SP, but also the endorsements of the data being published. These endorsements are signed by the data owner against tampering by the SP. Given a query, the SP returns both the query results and a proof, called verification object (VO), which can be used by the client to reconstruct the endorsements and thus verify the correctness of the results. As a location-based service usually concerns a spatial query, the authentication of such services can adopt the same paradigm as in query authentication. As FIG. 1 illustrates, after receiving a request, the SP evaluates the query based on the user locations obtained from the location registry, and delivers the result to the client. A VO, which includes endorsed values derived from user locations and ids, is also sent to the client to verify the correctness of the result.
However, while prior works address the query authentication issue, they fail to preserve the privacy of the data. In fact, they assume that during the verification process, the client can always be trusted and entitled to receive data values on the querying attribute(s). This assumption no longer holds in LBSs where the locations of mobile users are sensitive and should be protected against the clients. Therefore, the challenge of this work is how to design privacy-preserving query authentication schemes without disclosing any user location information to the client.
Unfortunately, the hiding of user locations from the client compounds the difficulty of authentication, and in fact, it brings out a new aspect of authentication. Traditional authentication verifies the soundness of a query by only checking whether the returned results are genuine because the compliance of the results, i.e., whether they comply with the query statement and are thus true results, is already implied by their returned values. However, without knowing these values, verifying the compliance is no longer trivial, which is indeed the challenge of privacy-preserving query authentication.
There is a large body of research works on query authentication for indexed data. These works originate from either digital signature chaining or Merkle hash tree. Digital signature is a mathematical scheme for demonstrating the authenticity of a digital message. It is based on asymmetric cryptography. Given a message, the signer produces a signature with its private key. Then the verifier verifies the authenticity of the message by the message itself, the signer's public key and the signature. Based on this scheme, early works on query authentication impose a signature for every data value. The VB-tree reported in H. Pang and K.-L Tan. Authenticating query results in edge computing. In Proc. ICDE, 2004 augments a conventional B+-tree with a signature in each leaf entry. By verifying the signatures of all returned values, the client can guarantee the soundness of these results. To further reduce the number of signatures returned to the client, they can be aggregated into one signature of the same size as each individual signature such as that reported in D. Boneh, C. Gentry, H. Shacham, and B. Lynn. Aggregate and verifiably encrypted signatures from bilinear maps. In EUROCRPYT, pages 416-432, 2003. However, the simple signature-based approach cannot guarantee the completeness, as the server can deliberately miss some results without being noticed. Therefore, Pang et al. proposed signature chaining in H. Pang, A. Jain, K. Ramamritham, and K.-L Tan. Verifying completeness of relational query results in data publishing. In SIGMOD, pages 407-418, 2005, which connects a signature with adjacent data values to guarantee no result can be left out. FIG. 2(a) illustrates signature chaining for four sorted values d1, d2, d3, d4. The signature of each value depends not only on its own value but also on the immediate left and right values. For the first and the last values d1 and d4, two special objects d0=−∞ and d5=+∞ are appended. If the server returns d2 and d3 to the client, it will also send a verification object (VO) that contains: (1) the signatures of d2 and d3, and (2) the boundary values d1 and d4. Given the VO, the client can verify the results through the facts that: (1) the two boundary values fall outside the query range, and (2) all signatures are valid. The first condition ensures that no results are missing and the second guarantees no values are tampered with. Signature aggregation and chaining were adapted to multi-dimensional indexes by Cheng and Tan in W. Cheng and K. Tan. Query assurance verification for outsourced multi-dimensional databases. Journal of Computer Security, 2009.
The Merkle hash tree (MHT) was introduced to authenticate a large set of data values as reported in R. C. Merkle. A certified digital signature. In Proc. Crypto, pages 218-238, 1989. FIG. 2(b) shows an MHT for the four data values in FIG. 2(a). It is a binary tree. Each leaf node with data value di is assigned a digest h(di), where h( ) is a one-way hash function. Each internal node Ni is assigned a digest which is derived from its child nodes, e.g., N1=h(H11|N2), where “|” denotes concatenation. In MHT, only the digest value of the root is signed by the data owner, and therefore it is more efficient than signature chaining schemes. An MHT can be used to authenticate any subset of data values. For example in FIG. 2(b), the server sends d1 and d2 to the client; and to prove their authenticity, the server also sends a VO to the client, which includes the digest of N2 and the signed root digest N. The client computes h(d1) and h(d2), then N1=h(h(d1)|h(d2)), and finally N=h(N1|N2). This computed root digest is then compared with the signed root digest in the VO. If they are the same, the client can verify that d1 and d2 are not tampered with by the server.
The notion of MHT has been generalized to an f-way tree and widely adapted to various index structures. Typical examples include Merkle B-tree and its variant Embedded Merkle B-tree (EMB-tree) such as that reported in F. Li, G. Kollios, and L Reyzin. Dynamic authenticated index structures for outsourced databases. In Proc. SIGMOD, pages 121-132, 2006. The latter reduces the VO size by embedding a tiny EMB-tree in each node. For multi-dimensional datasets and queries, similar techniques were proposed by Yang et al., who integrated an R-tree with the MHT (which is called Merkle R-tree or MR-tree) for authenticating multi-dimensional range queries as reported in Y. Yang, S. Papadopoulos, D. Papadias, and G. Kollios. Spatial outsourcing for location-based services. In Proc. ICDE, pages 1082-1091, 2008 and Y. Yang, S. Papadopoulos, D. Papadias, and G. Kollios. Authenticated indexing for outsourced spatial databases. The VLDB Journal, 18(3):631-648, 2009.
Besides selection and range queries, recent studies focus on the authentication of more complex query types, including kNN queries such as those reported in W. Cheng and K. Tan. Authenticating knn query results in data publishing. In SDM, 2007 and M. L Yiu, E. Lo, and D. Yung. Authentication of moving knn queries. In Proc. ICDE, pages 565-576, 2011, join queries such as reported in Y. Yang, S. Papadopoulos, D. Papadias, and G. Kollios. Authenticated indexing for outsourced spatial databases. The VLDB Journal, 18(3):631-648, 2009, and aggregation queries as reported in F. Li, M. Hadjieleftheriou, G. Kollios, and L Reyzin. Authenticated index structures for aggregation queries. ACM TISSEC, 13(32):1-35, 2010. Besides relational and spatial datasets, authentication of semi-structured and non-structured datasets was studied for streaming data in F. Li, K. Yi, M. Hadjieleftheriou, and G. Kollios. Proof-infused streams: Enabling authentication of sliding window queries on streams. In VLDB, 2007 and S. Papadopoulos, Y. Yang, and D. Papadias. Continuous authentication on relational streams. Very Large Data Bases Journal (VLDBJ), 19:161-180, 2010 and text data as reported in H. Pang and K. Mouratidis. Authenticating the query results of text search engines. In VLDB, 2008.
Our invention differs from all these works by being the first work on privacy-preserving query authentication, which also addresses the privacy-preserving kNN authentication for location-based services. The lack of querying attribute values from the client makes the authentication problem significantly harder. This calls for a new design of the authentication data structures and procedures, together with optimization techniques and cryptographic constructs, without which the authentication would be less practical.
As for location privacy, the literature of mobile computing and spatial databases extensively investigates this problem in various research domains, including query processing such as those reported in B. Bamba, L Liu, P. Pesti, and T. Wang. Supporting anonymous location queries in mobile environments with privacy grid. In Proc. WWW, 2008, C. Chow, M. Mokbel, and W Aref Casper*: Query processing for location services without compromising privacy. ACM TODS, 2009, G. Ghinita, P. Kalnis, A. Khoshgozaran, C. Shahabi, and K. Tan. Private queries in location based services: Anonymizers are not necessary. In SIGMOD, 2008, H. Hu, J. Xu, C. Ren, and B. Choi. Processing private queries over untrusted data cloud through privacy homomorphism. In Proc. of ICDE, 2011, P. Kalnis, G. Ghinita, K. Mouratidis, and D. Papadias. Preventing location-based identity inference in anonymous spatial queries. TKDE, 19(12):1719-1733, 2007, S. Papadopoulos, S. Bakiras, and D. Papadias. Nearest neighbor search with strong location privacy. In VLDB, 2010 and W. Wong, W. Cheung, B. Kao, and N. Mamoulis. Secure knn computation on encrypted databases. In Proc. SIGMOD, 2009, message communication as reported in B. Gedik and L Liu. Protecting location privacy with personalized k-anonymity: Architecture and algorithms. IEEE TMC, 7(1):1-18, 2008 and T. Xu and Y. Cai. Location cloaking for safety protection of ad hoc networks. In IEEE Infocom, 2009, and location data publishing as reported in H. Hu, J. Xu, S. T. On, J. Du, and K. Ng. Privacy-aware location data publishing. TODS, 35(3), 2010 and T. Xu and Y. Cai. Exploring historical location data for anonymity preservation in location-based services. In IEEE Infocom, Phoenix Ariz., 2008. In most works, location cloaking has been the predominant technique of privacy protection. However, it only protects privacy conditionally against certain privacy metrics, such as k-anonymity. Except for very few works such as G. Ghinita, P. Kalnis, A. Khoshgozaran, C. Shahabi, and K. Tan. Private queries in location based services: Anonymizers are not necessary. In SIGMOD, 2008, H. Hu, J. Xu, C. Ren, and B. Choi. Processing private queries over untrusted data cloud through privacy homomorphism. In Proc. of ICDE, 2011, S. Papadopoulos, S. Bakiras, and D. Papadias. Nearest neighbor search with strong location privacy. In VLDB, 2010 and W. Wong, W. Cheung, B. Kao, and N. Mamoulis. Secure knn computation on encrypted databases. In Proc. SIGMOD, 2009, unconditionally protecting user locations by disclosing nothing about them is an unprecedented task. Our invention is the first of this kind on query authentication and the first that addresses privacy-preserving kNN query authentication for location-based services.
Other patent prior arts exist for query and authentication but our invention is novel in view of these prior arts for the following reasons. U.S. Pat. Nos. 7,343,623 and 7,748,029 disclosed inventions that integrate the confidences of query results from different data sources and present them to the user as an overall composite result. Our invention does not involve any sort of confidence or probability or any multiple data sources.
U.S. Pat. No. 8,087,073 discloses an invention of an authentication architecture that identifies the subject itself to the web server so that the latter can verify that the request for a Uniform Resource Locator (URL) is from the genuine subject. Our invention concerns “authentication” as the semantics to “be able to verify that the results returned from the server is genuine”.
U.S. Pat. No. 7,979,711 discloses an invention that preserves query verification privacy by not disclosing the values of non-result objects. Our invention preserves “full” privacy by not disclosing any values, whether it belongs to a result or non-result object. Furthermore, our invention can handle both range and k-nearest neighbor queries while this prior art can only handle range query.
U.S. Pat. No. 7,610,265 discloses a data query invention that verifies whether two result tables are the same using aggregation. This is different from our invention, which verifies whether a returned query result is genuine and complete.
United States Patent Application Publication No. 2009/0254975 discloses a location based authentication system that uses a conventional identity authentication approach, which proves the identity of a mobile device is someone who can be trusted. This is different from our invention, which is based on an on privacy-preserving query authentication.
The present inventors have endeavored to develop a novel privacy-preserving query authentication invention that is a comprehensive solution that preserves unconditional location privacy when authenticating both range and k-nearest neighbor queries.
Citation or identification of any reference in this section or any other section of this application shall not be construed as an admission that such reference is available as prior art for the present application.