1. Field of the Invention
This invention relates to the automation of product and vendor data entry where the product and vendor data is provided by one or more product suppliers and can potentially be provided in many different formats. In particular, this invention relates to methods and systems to automatically import, analyze, and categorize data from different sources and in many possible different formats, and to output the processed data to on-line business-to-business service providers or to any other recipient with an interest in the cleansed data.
2. Description of Related Art
Computer networks such as the Internet have facilitated the transfer of information among computer users. Business-to-business (“B2B”) service providers, for example on-line shopping service providers, have taken advantage of the networking technologies to more efficiently and economically conduct their business transactions. The use of computers to transfer data, however, does not put an end to human intervention in the data transfer process.
Current on-line shopping web sites that offer a variety of products for sale, for example, face the formidable task of having to input and keep an inventory of the data related to the products they sell. Products are supplied by different sources which may also provide the information for the product being supplied.
Although the product data may be provided in electronic form, the on-line shopping service provider may have to enter the product information into their own databases manually. The reason for this is that there is no current data entry system that would convert product data formatted in any given manner to a standard format in which the data may be kept as part of the inventory database.
The data format problem is twofold. The first problem concerns the syntax of the data, which may differ according to the data supplier providing the data. A data supplier may, for example, use data transformation or conversion software such as Data Junction or InfoPump, both commercially available, to produce data with a given syntax or format.
The second problem, which is harder to solve than the first one, concerns the use of different terminology (semantics) by different product data suppliers in order to describe the same product. For example, one product supplier may use the term “IBM” while another may use “International Business Machines” as part of the description of the same product. That is, the descriptions for the same product may vary widely. Like the data syntax problem, this problem is associated with data formatting.
Consequently, there is a need in the art for a system that automates the data entry operation for products supplied by different sources where the data may be found in as many different formats. Further, there is a need in the art for a system that maps the different representations of a product into a common set of product information while preserving the original data sent by the different suppliers for use as a reference.