A search log contains valuable information about the searches and corresponding actions performed by users as they interact with a search engine. For example, a web search log collects queries and clicks of users issued to an Internet search engine. Alternatively, a search log may contain queries issued by users and actions performed on the displayed results (e.g., logs for enterprise search, mobile search, database search, product catalog search/transactions, and so forth).
A search log can be very useful for providing access to customer services. For example, accessing a search log can help a company improve existing products and services (e.g., keyword advertising) and build new products and services.
Moreover search logs are very valuable data sources that are currently not available to the research community. For example, in many instances an Internet search log is more useful than a web crawl or document repositories as the search log may be used to understand the behavior of users posing queries, and obtain algorithms for problems such as computing related searches, making spelling corrections, expanding acronyms, determining query distributions, query classification, and/or tracking the change of query popularity over time. Advertisers can use such a log to better understand how users navigate to their web pages, gain a better understanding of their competitors, and improve keyword advertising campaigns.
However a search log contains a considerable amount of private information about individuals, and thus a search company cannot simply release such data. Indeed, user searches provide an electronic trail of confidential thoughts and identifiable information. For example, users may enter their own name or the names of their friends, home address, and their medical history as well as of their family and friends. In the case of web search logs, users may enter their credit card number and/or social security number as a search query, just to find out what information is present on the web.
In sum, releasing a search log is beneficial for various data-mining tasks, however doing so risks compromising user privacy. Previous attempts to release search logs while maintaining privacy have failed; one attempt replaced usernames with random identifiers, however the searches were easy to match to an individually identifiable person based on the rest of the data. Other ad-hoc techniques, such as tokenizing each search query and securely hashing the token into an identifier, have been explored in the literature and are shown not to protect privacy.