A major role of a publication site is to provide a bridge for content recipients and content generators so that content recipients can efficiently locate content or items that have been listed by content generators. The proper categorization of the content listed is important in helping the publication site provide recommendations in response to a user's query. When a content generator uploads an inventory of content to be listed on a publication site, the content titles and content categories provided by the content generators are used by the publication site to map those listings to its own taxonomy or category tree such that the publication site can make relevant recommendations in response to a user's query.
The granularity and the different terms used by parties to describe the same items creates numerous disparities between taxonomies created by different parties. Furthermore, parties are constantly updating their taxonomies, such that a prior mapping may become quickly out-of-date. It is often challenging for a publication company to continue mapping accurately with so many updates, especially using existing methods with human manual involvement, and especially when a goal of a publication company is to continue engaging with large amounts of content generators to onboard their content as listings on the publication site. For example, an existing method to address this mapping task is primarily based on lexical level matching with manually crafted mapping files and large numbers of regular expression based mapping rules. Such an approach cannot be shared across different content generators due to variations in terminology used in the taxonomies of the content generators. Furthermore, as new content is introduced, the legacy rules may no longer be relevant and may downgrade the performance of the mapping process.
The headings provided herein are merely for convenience and do not necessarily affect the scope or meaning of the terms used.