In a number of contexts, there are potential communication difficulties due to different semantic environments between the source and target data systems for a given communication. Such semantic environments may differ with respect to linguistics and/or syntax. In this regard, linguistic differences may be due to the use of different languages or, within a single language, due to terminology, proprietary names, abbreviations, idiosyncratic phrasings or structures and other matter that is specific to a location, region, business entity or unit, trade, organization or the like (collectively “locale”). Also within the purview of linguistic differences for present purposes are different currencies, different units of weights and measures and other systematic differences. Syntax relates to the phrasing, ordering and organization of terms as well as grammatic and other rules relating thereto. It will be appreciated that difficulties relating to different semantic environments may be experienced in international communications, interregional communications, interdisciplinary communications, or even in communications between companies within the same field and country or between units of a single enterprise. Increased globalization has heightened the need for machine-based tools to assist in transformation of information, i.e., manipulation of information with respect to linguistics, syntax and other semantic variations.
Today, such transformation is largely a service industry. A number of companies specialize in helping companies operate in the global marketplace. Among other things, these companies employ translators and other consultants to develop forms, catalogs, product listings, invoices and other business information (collectively, “business content”) for specific languages as well as assisting in the handling of incoming business content from different source languages or countries. Such services have been indispensable for some businesses, but are labor intensive and expensive. Moreover, the associated processes may entail significant delays in information processing or, as a practical matter, have limited capacity for handling information, both of which can be unacceptable in certain business environments. In short, manual transformation does not scale well. Moreover, such transformation has had limited applicability to more open-ended problems such as electronic information searches across semantic boundaries outside of the business-to-business context.
A number of machine translation tools have been developed to assist in language translation. The simplest of these tools attempt to literally translate a given input from a source language into a target language on a word-by-word basis. Specifically, content is input into such a system, the language pair (source-target) is defined, and the literally translated content is output. Such literal translation is rarely accurate. For example, the term “butterfly valve” is unlikely to be understood when literally translated from English to a desired target language.
More sophisticated machine translation tools attempt to translate word strings or sentences so that certain ambiguities can be resolved based on context. These tools are sometimes used as a starting point for human or manual translation or are used for “gisting”, which is simply getting the gist of the content. However, they tend to be highly inaccurate even when applied for their primary purpose which is to translate standard text written in common language and in complete sentences conforming to standard rules of syntax.
Such tools are especially inadequate for use in transforming business content. Such content often is loaded with industry specific technical terms and jargon, standard and ad hoc abbreviations and misspellings, and often has little or no structure or syntax in its native form. Moreover, the structure of such business content is often composed of short item descriptions. Such descriptions are linguistically defined as a “noun phrase”. A noun phrase has one overriding characteristic; it has no verb. The tendency of machine translation systems to try to create sentences produces unintended results when applied to noun phrases. For example, the term “walking shoe” may translate to a shoe that walks. Thus, machine translation tools, though helpful for certain tasks, are generally inadequate for a variety of transformation applications including many practical business content applications as well as information searches outside of business content applications.
To summarize, from a practical viewpoint relative to certain applications, it is fair to state that conventional machine translation does not work and manual translation does not scale. The result is that the free flow of information between locales or semantic environments is significantly impeded and the potential benefits of globalization are far from fully realized.