Content Management, also known as CM, is a set of processes and technologies supporting handling digital information. This digital information is often referred to as digital content. Currently, people managing content have very few tools to tell them, a priori, if users will be able to locate their content.
“Findability” is the term used to refer to the quality of being locatable or the ability to be found. Findability has become highly relevant with the expansion of the World Wide Web. However, findability is not limited to the web and can equally be applied to other environments. The structure, language and writing style used for content description all have a huge effect on the “findability” of content by users searching for information encapsulated in that content.
This document focuses on textual content. For example, a set of textual documents such as web pages belonging to a specific web site or intranet site. Content in this case is referring to the textual content of these pages, and to the anchor text of hyper-links pointing to these pages. Textual content may also be retrieved in the form of a single document or related documents from a database, or other repository.
Content may be difficult to find due to poor content, structure, or because it is indistinguishable from other content. Search engines are programs designed to help find information. A user asks a search engine to locate content relevant to his information need. This need is specified by the user's “query” submitted to the search engine. A query might be a free text expression, or any Boolean expression complying with the query syntax supported by the search engine. The search engine retrieves a ranked list of documents which match the user's query. Ranking is determined according to the expected relevance of the documents to the user's information need.
In some known cases, it is possible in retrospect, to estimate the findability of the content. This can be done by observing the queries which successfully brought users to the specific content. However, in such cases it is impossible to know which queries users typed that failed to bring them the content.
Also, there is a wide field known as Search Engine Optimization (SEO) which attempts to modify the content of web pages so as to bring them to the top ranking of a search engine. SEO is usually based on examining search logs for query terms entered by users. SEO provides tips for restructuring of web pages. For example, SEO is used to provide the following information to a web site owner:                Percentage of traffic generated by site search and external search engines.        Which query terms are currently driving traffic to the site.        Which terms are the most popular on external engines and how to optimize for those words.        Details on current ranking/positioning in the major external engines.        Detailed page-level audits with recommendations for improvement.        
A related application is selection of words that trigger the display of advertisements on the web. An advertising contractor typically auctions keywords to advertisers. The advertisers need to choose and to price the optimal words and phrases to be associated with their site. Existing tools for term selection are based solely on co-occurrence frequency analysis of historical search logs, and require the advertiser to first provide at least one search term. In other words, they offer refinement of phrases given to them as input, and the only guidance they give is based on historical frequencies, where higher frequency is preferred.