Incremental search, which is also known as a typeahead search, a find as you type search, word wheeling, autocomplete, etc., has become a feature of search engines and major web browsers, as well as part of mainstream document editing and software development tools, operating systems and many other text aware applications. The incremental search combines two features: generating autocomplete suggestions for a partially typed search term (often referred to as a prefix) and highlighting such prefix within a document or a series of documents. Occurrence statistics may be displayed concurrently with a search session.
The highlighting feature of an incremental search is straightforward and uses fast text indexing and a user-friendly interface, such as multi-color highlighting of several adjacent partially typed terms. For example, when a user types in an incremental search term “hea ins” searching for “health insurance”, an incremental search system may apply different highlight colors to every occurrence of the two incomplete terms “hea” and “ins”. In contract with the highlighting feature, user satisfaction with the generated incremental search suggestions rests on many factors and uses different techniques that may be sensitive to the type of content, content volume, ownership specifics, privacy considerations and other aspects of search.
Popular applications of incremental search functionality include built-in search features of web browsers (Chrome, Internet Explorer 8, Firefox, etc.), as well as the search fields of search engines (Google, Bing, Yahoo, Ask.com and many other implementations). Incremental search is may also be provided in customized versions of search engines on social networking, professional, e-commerce and other sites, including Facebook, LinkedIn, Amazon.com and Monster.com.
In spite of noticeable differences in functioning, underlying content base and user interface of different implementations, the implementations share similar approaches to building and displaying lists of incremental search suggestions. The suggestions may be derived from a history of public searches in back-end databases of the respective sites or engines, rather than directly from the site content; the popularity (frequency) of previous search of any particular term may play an important role in scoring the term, which ultimately determines inclusion of the term into the suggestion list displayed to a user.
Another significant common feature of existing methods is the treatment of a document frequency (df) parameter; because of very large volumes of content databases used in the above examples, the uniqueness of search terms is highly encouraged and awarded. Many widespread scoring formulas for incremental search suggestions include an “idf” (inverse document frequency) ratio as a multiplier—sometimes, with a logarithmic or other non-linear scaling (consistently with concepts of the information theory). Irrespective of fine details of scaling functions user in those implementations, they share the same reverse monotonic behavior where the suggestion score and the chances of inclusion a term into the suggestion list monotonically decrease when the document frequency value increases.
The common characteristics of many prominent implementations of incremental searching results in success of the incremental searching feature on public websites and search engines. However, the same characteristics may not be immediately applicable to personal content databases, such as user notebooks in the Evernote service and software, designed by the Evernote Corporation of Redwood City, Calif. The amount of content stored in personal content databases may be significantly smaller: for example, the amount of content may be measured in thousands rather than billions of documents. Additionally, a very limited personal search history may be available without any content relevant to billions of public searches in the personal databases. Therefore, the search history and monotonic inverse document frequency approach to generating suggestions for incremental searches may not be relevant to searching personal content databases.
Accordingly, it is desirable to develop adequate systems and methods for generating and ranking of incremental search suggestions that address major differences between public and personal content databases.