The disclosed technology relates to implementing natural language search with semantic mapping and classification; that is, discerning the intent of a user's search query and returning relevant search results.
Search engines are designed to search for information on the World Wide Web, with search results presented as search engine results web pages, images and other types of files. Some search engines also mine data available in databases or open directories, and maintain real-time information by running an automated web crawler which follows the links on the site. The search engine then analyzes the contents of each page to determine how it should be indexed (for example, words can be extracted from the titles, page content, headings, or special fields called meta tags).
Data about web pages is stored in an index database for use in later queries. The index helps search engines find information relating to the query as quickly as possible. Some search engines store all or part of the source page (referred to as a cache) as well as information about the web pages, whereas others store every word of every page they find. This cached page holds the actual search text since it is the one that was actually indexed, so it can be very useful when the content of the current page has been updated and the search terms are no longer in it.
When a user enters a query into a search engine (typically by using one or more keywords), the engine examines its index and provides a listing of best-matching web pages according to its criteria, usually with a short summary containing the document's title and sometimes parts of the text. Most search engines support the use of the Boolean operators, and some search engines provide an advanced feature called proximity search, which allows users to define the distance between keywords. There is also concept-based searching where the research involves using statistical analysis on pages containing the words or phrases for which a user searches. As well, natural language queries allow the user to enter a question in the form one would ask to a human.
A natural language search engine would, in theory, find targeted answers to user questions (as opposed to keyword search). For example, when confronted with a question of the form ‘which U.S. state has the highest income tax?’, conventional search engines ignore the question and instead search on the keywords ‘state’, ‘income’ and ‘tax’. Natural language search, on the other hand, attempts to use natural language processing to understand the nature and context of the question, more specifically the underlying intent of the user's question, and then to search and return a subset of the web that contains the answer to the question. If it works, results would have a higher relevance than results from a keyword search engine.
The usefulness of a search engine depends on the relevance of the result set it returns. While there may be millions of web pages that include a particular word or phrase, some pages may be more relevant, popular, or authoritative than others. Most search engines employ methods to rank the results to provide the best results first. How a search engine decides which pages are the best matches, and what order the results should be shown in, varies widely from one engine to another. Many search engines rely on title match, category lookup, and keyword frequency within user reviews, which is insufficient for all but the simplest queries.
An opportunity arises to develop better systems and methods for implementing natural language search with semantic mapping and classification.