In addition to a number of popular general search engines available, many commercial and non-commercial web sites offer their own search engines allowing a searching capability for specialized content. By way of example, some web sites offer specific search engines that search for movie-related topics, others for health-related questions, and still others for items to purchase. Even though many general search engines are able to index a vast number of web pages and have increasingly sophisticated ranking algorithms, performing searches using a web site search engine directly often has distinct advantages.
First, general search engines may have limited or no access to proprietary content or database. Second, more domain-specific information can be used for ranking the results. For example, given a search query “pinot noir”, a wine web site might return the wines ranked by taste score, price, or both. Third, the search results are often presented with a more appropriate format or user interface. Fourth, since each web site usually focuses on a specific domain or product, the user has to do less filtering of totally unrelated results.
Research has shown that the click-through rates (CTR) of differ between web site search engines and general search engines. The CTR is a measure of the interest in which a user shows in search results presented to him. In particular, for the same set of queries a user is more likely to make clicks when she issues queries on domain-specific web sites than on a general search engine. Thus, web site search engines are more likely to return search results which a user is interested
The problem is getting the user to a specific web site. Users often have trouble finding domain-specific sites to search. Recently there has been attention given to this problem of user intent classification in information retrieval. This work focuses on categorizing a given search query into a general intent (such as news, images, automobiles, and so forth) that can be identified as macro vertical/intent selection. However, many of these techniques pick only high-level categories is because they use supervised learning methods that require manual labeling of training data. In other words, these approaches require search queries to be labeled by human judges. These approaches therefore are quite expensive and unable to make prompt response to topic evolution over time.