Many attempts have been made to automatically classify documents or otherwise identify the subject matter of a document. In particular, search engines seek to identify documents that are relevant to the terms of a search query based on determinations of the subject matter of the identified documents. Another area in which classification of documents is important is in the area of product-related documents such as product descriptions, product reviews, or other product-related content. The number of products available for sale constantly increases and the number of documents relating to a particular product is further augmented by social media posts relating to products and other content.
Often, a document describing a product includes unstructured data, e.g. free-form text by a manufacturer, retailer, expert, enthusiast, or the like. However, such text is not readily used to compare products. For example, a customer wishing to comparison shop is burdened with extracting relevant information from this unstructured data in order to make an informed decision. In other instances, a product record may include structured data that has a different labeling schema than product records in another schema.
In view of the foregoing, it would be an advancement in the art to provide methods for relating structured data from different schemas as well as generating structured representation of unstructured data, particularly product-related documents.