In a variety of applications, it may be desirable to convert data from a first (or input) form to a second (or target) form. Such conversions may involve, for example, changes of data with respect to linguistics, syntax and/or formats. In this regard, linguistic differences may be due to the use of different languages or, within a single language, due to different uses of terminology, proprietary names, abbreviations, idiosyncratic phrasings or structures and other matter that is specific to a location, region, business entity or unit, trade, organization or the like. Also within the purview of linguistic differences for present purposes are different currencies, different units of weights and measures, and other systematic differences. Syntax may relate to the phrasing, ordering and organization of terms as well as grammatical and other rules relating thereto. Differences in format may relate to data structures or conventions associated with a database or other application and associated tools.
One or more of these differences in form may be advantageously addressed in connection with a conversion process. Some examples of conversion environments include: importing data from one or more legacy systems into a target system; correlating or interpreting an external input (such as a search query) in relation to one or more defined collections of information; correlating or interpreting an external input in relation to one or more external documents, files or other sources of data; facilitating exchanges of information between systems; and translating words, phrases or documents. In all of these cases, a machine-based (e.g., a computer-based) tool may be used to attempt to address differences in linguistics, syntax and/or formats between the input and target environments.
One difficulty associated with machine-based conversion tools relates to properly handling context dependent conversions. In such cases, properly converting a contextually dependent item under consideration may depend on understanding something about the context in which the item is used. For example, in the context of product descriptions, an attribute value of “one inch” might denote one inch in length, one inch in radius or some other dimension depending on the product under consideration. In this regard, the context in which “one inch” is used may dictate the permissible and/or proper conversion of the data “one inch” (e.g., into a length, radius, or other dimension). In another example, in the context of translation, the term “walking” functions differently in the phrase “walking shoe” than in “walking to work.” Thus, in these examples and many others, understanding something about the context of an item under consideration may facilitate conversion.
Although the value of context in disambiguating or otherwise properly converting information is well recognized, limited success has been achieved in applying this notion to machine-based tools, especially in unsupervised machine learning. For example, certain data to be converted may be highly unstructured and/or otherwise contextually indeterminate. As such, the data may not have indicators (e.g., either internal to the data or available from external sources) that indicate the context in which source data is used. That is, the data may be contextually indeterminate such that the context of the data is not readily discernible. Furthermore, source data may include a plurality of different contexts such that the contexts of different subsets of the source may present in the data. In this regard, conversions that are contextually dependent may not be possible using traditional approaches of identification of context for use in the conversion of the contextually dependent data.