1. Field of the Invention
The present invention relates to merging hierarchies of products.
2. Description of the Related Art
The explosive progress in computer networking, data storage, and processor speed has enabled large amounts of transactions, including Web-based transactions. To support transactional applications, hierarchies are used to arrange product information in a way that promotes fast and efficient processing of transactions. As but one non-limiting example, a Web merchant might maintain a hierarchy of electronics products, with camera information being stored in a xe2x80x9ccameraxe2x80x9d node, printer information being stored at a xe2x80x9cprinterxe2x80x9d node, and so on. Each node can have sub-nodes, e.g., the xe2x80x9ccameraxe2x80x9d node can have a xe2x80x9cdigital cameraxe2x80x9d sub-node and a xe2x80x9cnon-digital cameraxe2x80x9d subnode. When a person indicates a desire to purchase a digital camera, the Web server accesses the camera node to present the data therein to the user.
It is conventionally the case that the hierarchy of a Web site is manually constructed, with a person placing the appropriate information in the appropriate place in the hierarchy. When a new manufacturer joins the marketplace that is defined by the Web site, the products of the new manufacturer must be placed in the appropriate nodes of the main hierarchy. If the new manufacturer happens to use exactly the same hierarchical scheme as the main hierarchy, this is trivial, since the new products map exactly from their node to a corresponding node in the main hierarchy. However, if the two hierarchical schemes are not the same, which is typically the case, merging the hierarchies becomes non-trivial.
Heretofore, although non-trivial, merging two hierarchies has not been a programming problem since the new hierarchy simply is manually merged with the main (market) hierarchy by a person adding the products from the new hierarchy into the appropriate nodes in the main hierarchy. Nonetheless, as can be recognized as a consequence of the above discussion, it is unfortunately the case that constructing the main hierarchy and merging products from new hierarchies into the main hierarchy is slow and labor intensive. The present invention accordingly has recognized a critical need to automatically merge products from one hierarchy into another, differently-constructed hierarchy.
The invention is a general purpose computer programmed according to the inventive steps herein to merge products from two or more hierarchies into a single hierarchy. The invention can also be embodied as an article of manufacturexe2x80x94a machine componentxe2x80x94that is used by a digital processing apparatus and which tangibly embodies a program of instructions that are executable by the digital processing apparatus to undertake the present invention. This invention is realized in a critical machine component that causes a digital processing apparatus to perform the inventive method steps herein. The invention is also a computer-implemented method for undertaking the acts disclosed below.
Accordingly, a computer-implemented method is disclosed for merging product information in a first hierarchy having a first structure into a second hierarchy having a second structure different than the first structure. The method includes generating a classifier, preferably a Naive-Bayes classifier, using text and attributes associated with product information in the second hierarchy. The method also includes using the classifier to associate product information in the first hierarchy with nodes in the second hierarchy. More specifically, product information on a product in the first hierarchy is associated with at least one node in the second hierarchy corresponding to a highest classification probability for that product.
In a preferred embodiment, the generating act includes multiplying a probability based on product information text by a probability based on product information attributes. If desired, a product in the first hierarchy can be associated with at least two high score nodes in the second hierarchy when each high score node corresponds to a classification probability exceeding a threshold. Conversely, product information on a low score product in the first hierarchy is not associated with a node in the second hierarchy when no node in the second hierarchy is associated with a classification probability exceeding a threshold. A low score node in the first hierarchy is designated as a node in the second hierarchy when the low score node contains at least a threshold number of low score products.
As set forth in greater detail below, the present invention recognizes the intuition that if two products were grouped together in the first hierarchy they have a higher likelihood of being grouped together in the second hierarchy as well. Accordingly, product information on a product from a first node in the first hierarchy can be associated with a second node in the second hierarchy based on how many products in the first node have been associated with the second node.
In another aspect, a computer system includes a program of instructions that in turn includes structure to undertake system acts. These system acts include receiving a main hierarchy having nodes representing product classes, and receiving a new hierarchy having nodes representing product classes. A Naive-Bayes classifier is generated using the main hierarchy, and then products in the new hierarchy are associated with nodes in the main hierarchy using the classifier.
In yet another aspect, a system storage device includes system readable code that can be read by a computer for associating products in a new hierarchy with product classification nodes in a main hierarchy. The storage device includes computer readable code means for generating a Naive Bayes classifier based on hierarchy training data containing both text and numerical attributes. Also, computer readable code means are provided for using the classifier and a determination of how many products in at least a first node in the new hierarchy are associated with at least a second node in the main hierarchy to associate products in the new hierarchy with product classification nodes in the main hierarchy.
The details of the present invention, both as to its structure and operation, can best be understood in reference to the accompanying drawings, in which like reference numerals refer to like parts, and in which: