With the development of e-commerce, online stores and online transactions are becoming more and more common among online users. An online trading system provides an online trading platform where the products in the online stores are managed based on categories. Each broad category may be divided into smaller sub-categories, thereby forming a category tree. As the number of online products increase, the category tree also grows bigger. Thus sellers operating the online stores may unintentionally or intentionally place the products they're selling under the incorrect categories, which is known as category misplacement.
The category misplacement may lead to inaccurate search results, and may also waste system storage and computing resources and give unpleasant user experiences. For example, a user clicks the category tree to view products of category A, but products of category B are presented. The category misplacement may also cause losses to the sellers. For example, products that are misplaced in a wrong category may be overlooked. If products that are misplaced in the wrong category can be identified and corrected, then the negative effects mentioned above can be eliminated, thereby increasing utilization of the system storage and the computing resources and providing better services to both the buyers and sellers.
One method for identifying misplaced products is based on a click dictionary. The click dictionary is composed of multiple records, where each record indicates a probability that the user clicks a specific category within a search result of a query made by the user. Based on recording of the user's query and click behaviors, a distribution of the products that the user has clicked for a specific query can be obtained. When it is determined whether a product has been misplaced in the wrong category, a title of the product is segmented. Each word resulting from the segmentation is treated as one query and a category distribution of the query is searched through the click dictionary. If a matched category is found, then no category misplacement exists; otherwise, category misplacement exists.
When using the above method for a massive amount of data (e.g. tens of millions or billions of product data), there is a high chance of omission for identifying the category misplacement. Such method may only identify and recall tens of thousands of category misplacement data. One reason is that there is a huge amount of data in the click dictionary and the distribution is sparse, and a majority of the products with category misplacements are not covered in the click dictionary queries. Another reason is that the method requires intensive computation and complex process which leads to high system resource requirement and long calculation time. Therefore, the above method cannot satisfy requirements of internet industries.