In a variety of contexts, it is desired to convert semantic information from a first or input form to a second or target form. Such conversions may involve, for example, linguistics, syntax and formats. In this regard, linguistic differences may be due to the use of different languages or, within a single language, due to terminology, proprietary names, abbreviations, idiosyncratic phrasings or structures and other matter that is specific to a location, region, business entity or unit, trade, organization or the like. Also within the purview of linguistic differences for present purposes are different currencies, different units of weights and measures and other systematic differences. Syntax relates to the phrasing, ordering and organization of terms as well as grammatic and other rules relating thereto. Differences in format may relate to data structures or conventions associated with a database or other application and associated tools.
One or more of these differences in form may need to be addressed in connection with a conversion process. In particular, at least linguistics or syntax generally needs to be addressed in the context of semantic conversions. Some examples of conversion environments include: importing data from one or more legacy systems into a target system; correlating or interpreting an external input (such as a search query) in relation to one or more defined collections of information; correlating or interpreting an external input in relation to one or more external documents, files or other sources of data; facilitating exchanges of information between systems; and translating words, phrases or documents. In all of these cases, a machine-based tool attempts to address differences in linguistics, syntax and/or formats between the input and target environments. It will, be appreciated in this regard that the designations “input” and “target” are largely a matter of convenience and are process specific. That is, for example, in the context of facilitating exchanges of information between systems, which environment is the input environment and which is the target depends on which way a particular conversion is oriented and can therefore change.
One difficulty associated with machine-based conversion tools relates to properly handling context dependent conversions. In such cases, properly converting an item under consideration depends on understanding something about the context in which the item is used. For example, in the context of product descriptions, an attribute value of “one inch” might denote one inch in length, one inch in radius or some other dimension depending on the product under consideration. In the context of translation, the term “walking” functions differently in the phrase “walking shoe” than in “walking to work.” Thus, in these examples and many others, understanding something about the context of an item under consideration may facilitate conversion. Although the value of context in disambiguating or otherwise properly converting information is well recognized, limited success has been achieved in applying this notion to machine-based tools.
Recently, products have become available to automate certain aspects of the conversion process. One such product is the DataLens™ System of Silver Creek Systems (Superior, Colo.). That product allows for normalization of unstructured or otherwise incomplete, incompatible or problematic semantic information (e.g., product descriptors, search strings or other semantic content) to facilitate conversion processes as described above. That product can apply significant intelligence to resolve ambiguities based on context and can identify potential conversion errors based on rules related to valid attributes and attribute values. This has proved to be a significant advance in reducing the labor required for such processes and improving accuracy. However, even that product requires some knowledge base to perform efficiently and accurately. While much knowledge can be reused in subsequent conversion contexts, establishing such a knowledge base has generally required some time investment by a subject matter expert or other operator. It would be highly desirable to improve automation in this regard and reduce the required time investment.