Web search has become a very popular method for seeking information. Users may have a variety of intents while performing a search of the Web. For example, some users may already have in mind the site they want to visit when they enter a query, however, the users may not know the URL of the site or may not want to type in the full URL, and may rely on the search engine to present a link to the site they know they want to visit. By contrast, other users may have no idea of what sites to visit before seeing the search results, where the information these users are seeking typically exists on more than one page. According to research, approximately 18% of queries in Web search are navigational queries, i.e., queries reflecting the situation when the user already has in intended site in mind. Therefore, correctly identifying navigational queries has a great potential to improve search performance. However, navigational query identification is not trivial due to a lack of sufficient information in Web queries, which are normally quite brief.
Recently, query classification is drawing significant attention. Many machine learning approaches that have been used in a general classification framework, including naive Bayes classifier, maximum entropy models, support vector machines, and gradient boosting tree, have their own advantages that suit certain problems. Due to the characteristics of navigational query identification, it is not apparent which approach performs best for identifying navigational queries. Further, machine learning models often suffer from feature dimensionality, in which use with a large number of features produces incorrect or no results. Consequently, most prior work in query identification is based on a small number of features that are obtained from limited resources.
In view of the foregoing, there is a need for techniques for accurately identifying navigational queries in real-time.
Any approaches that may be described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.