With the continuing development of the Internet, more and more information is stored on the Internet. To obtain information from any field of knowledge, a user typically utilizes a search engine. Given the large volumes of information on the Internet, common query methods frequently result in search results that are inaccurate. Consequently, a vertical search method has been developed. Vertical search refers to field or business-specific search that targets a particular field or business by a vertical search engine, which is the subdivision and extension of a search engine. A vertical search engine integrates information of the targeted field collected from a webpage warehouse. After segmentation to obtain necessary data for processing, the vertical search engine sends the results to the user in a certain form. In view of common search engines that deal with large volumes of information and produce search results that are not sufficiently accurate or thorough, a new model of search engine service is being promoted. By providing specific information targeted for a field, a group of people or a need, valuable services can be provided by vertical search. Words such as “specialized, precise, thorough” may be appropriate description of vertical search, in addition to its business-oriented nature. In contrast to the common search engines that produce immense amount of unordered information, vertical search engines tend to have better precision, specificity and thoroughness.
There are many applications for vertical search engines. For example, corporate warehouse searches, supply and demand information searches, shopping searches, real estate searches, talent searches, map searches, MP3 searches, and image searches may benefit from vertical searches. Practically, the search for any kind of information can be further refined by a respective type of vertical search engine.
When a vertical search engine is used for a shopping search, the user inputs a commodity search keyword into a business-to-consumer (B2C) or consumer-to-consumer (C2C) shopping website. As shown in FIG. 1A and FIG. 1B, the search result often comprises multiple parts: 1. the navigation information of the commodity, such as the commodity category, 2. the attribute category corresponding to the commodity category, and 3. the commodity under the commodity category. For navigation, the commodity classification names are organized in a tree-like structure, and such organization makes it convenient for a user to follow the tree structure in a top-down fashion to obtain more accurate search results according to commodity classification names. The attribute category refers to one or more commodity attributes most-commonly sought after by the user, based on historical hit data of the commodity category.
The commodity category tree structure is stored in a data table in a database. The entry and maintenance of data is manually done. Whether in a B2C or C2C website, the display of each commodity must be classified in one or more nodes of the commodity category tree.
Current e-commerce websites tend to have tremendous amount of commodities, resulting in excessive classification of commodities. In the case of multi-billion product lines, the commodity category tree may have up to tens of thousands of nodes with the number of nodes for each level of category often being in multiple dozens. When a user initiates a search, the amount of commodity classification information presented to the user is excessive, resulting in difficulty in advising the user which one or ones of the presented classifications may be more relevant to the user's interest. To address this problem, an existing approach is to count the number of returned results under each category and rank the commodities in a descending order based on the number counts. A threshold is set so that the category of a type of commodity with numbers lower than this threshold is hidden from the user. This is intended to reduce the amount of classification.
The following problems associated with the current technology have been observed:
(1) the relevance of the categories presented to the user's query is very low;
(2) there is no mechanism to indicate the relative importance of one category relative to another; and
(3) One or more categories of high importance may be inadvertently hidden as a result of a threshold to hide the one or more categories from the user.