According to research data, knowledge workers spend 38% of their time searching for information. A 2012 global survey of information workers and IT professionals by IDC has discovered that, on average, a knowledge worker spends about five hours per week searching for documents, and almost half of that time results in searching for documents that are not found. With the proliferation of personal content management systems, such as the Evernote Service and software developed by the Evernote Corporation of Redwood City, Calif. or Microsoft Corporation's OneNote software, search in personal, shared and corporate wide content collections is becoming a central feature of contemporary content management systems and search efficiency is considered by many a key factor defining user productivity.
Numerous flavors and directions of search techniques are quickly progressing and finding demand among individual and corporate users. This may include search in diverse types of data, such as text, images, video, audio, directly rendered and attached media, various search workflows ranging from incremental keyword search to semantic search with natural language queries, question answering in general and vertical search engines developed for specific knowledge areas, etc.
The natural language user interface is evolving as an important aspect of search engines for many audiences. Search and question answering systems, where queries may be entered as natural language phrases, have been implemented in a variety of desktop and mobile applications, such as Wolfram Alpha, generic Google Search, Facebook Graph Search, Apple Siri, Google Search for Android, etc. For example, Google search for an incomplete phrase “how many US presidents” instantly offers several natural language continuations, including suggestions such as “ . . . were left handed”, “ . . . have we had”, “ . . . have there been”, etc. Personal desktop applications, such as Microsoft's Windows 7 File Explorer or Microsoft Outlook 2010 (part of Microsoft Office 2010 software suite) are also supplied with basic features of a natural language interface. Thus, searching a folder “Documents” on a Windows 7 personal computer with an enabled natural language search option allows a user to instantly find files satisfying natural language queries, for example, “images last week” or “large pdf”, pertaining to various types of personal content, size, creation/update time of items, and other content parameters.
Notwithstanding recent developments, applications of natural language interfaces in search through personal and shared content collections are facing significant challenges. In addition to traditional issues with natural language interfaces such as interpretations of modifications, conjunctions and disjunctions, anaphora resolution, contextual use of synonyms and associated problems of semantic search, there are specific difficulties related to limited data volume in personal and enterprise context collections, generalization of terms and choosing between keyword search and natural language search.
Systems that rely strictly on natural language interface in question answering, search and assistance may cause user dissatisfaction, which was reported for many practical applications. A recent broad survey of mobile assistants indicated that correct interpretations of natural language queries were achieved at a 70% level at best. Similar situation exists with natural search in limited content collections. In a previously discussed example of a natural language search query for “images last week”, even a slight modification of the query to read “image files last week” misguides the search. Another natural language query that suggests a search for “office documents” does not work correctly at all; instead of retrieving files in standard formats of various office suites, such as Microsoft Office, as would be expected by a majority of users, the system looks for the term “office” as part of file names and does not attempt to resolve the query as a natural search predicate. In other words, a natural language search component cannot distinguish between a meta-term “office document” for search in a specific content category and a keyword “office” for search in file names and document content.
Accordingly, it is desirable to develop robust natural language interfaces for search queries in personal, shared and corporate data collections that combine advantages of incremental keyword search and natural language input.