The invention relates generally to data transformation and more specifically, to a system and method for accomplishing various data transformations in an automated manner using plural vocabularies and established relationships therebetween.
Electronic commerce, sometimes known as “e-commerce”, is well known generally. The objective of e-commerce is to eliminate manual trading processes by allowing internal applications of different entities, known as “trading partners,” to directly exchange information. The objective of e-commerce is to minimize the need for manual information exchange in traditional commerce. Many large companies have effected electronic commerce using a data interchange format known as “Electronic Data Interchange” (EDI). EDI has proven itself to be very effective.
The Internet and extensible markup language (XML) have created forms of data interchange that are less expensive and thus have lowered the barriers to entry for accomplishing data interchange generally and e-commerce in particular. Many newer e-commerce systems currently are based on XML. Similar to EDI systems, these newer systems allow the internal applications of different companies to share information directly and thus eliminate the need for manual communication relating to transactions. Data is placed between descriptive XML tags as metadata. XML messages are thus rich in metadata making them easy to read and debug. Further, the simplicity of XML permits persons with limited training to develop and maintain XML-based applications, in turn making XML applications less expensive to implement.
Notwithstanding the characterization of EDI as a “standard,” there are many approaches to EDI. First, EDI is defined by two distinct standards, ASC X12 and EDIFACT, both of which are hereby incorporated herein by reference. ASC X12 is the standard for EDI in the United States and has evolved over the years. EDIFACT is the international standard, endorsed by the United Nations and designed from the ground up beginning in 1985. Further, X12 and EDIFACT each have several version releases of their message formats. Compatibility between versions is not always straightforward. In addition, there are other groups such as the Open Buying Initiative (OBI) proposing standards for implementing EDI messages over hypertext transfer protocol (HTTP).
XML-based e-commerce is even more diversified. As of August 2000, nearly one hundred XML-only standards were under development. Microsoft™, Ariba™, IBM™ and almost 30 other technology companies have combined to create UDDI (Universal Description Discovery and Integration), which will allows companies to publish information about the Web services they offer in a Universal Business Registry that will be accessible by anyone. RosettaNet™ is developing XML standards for product catalogs. Commerce One™ has created the common business library (CBL). Ariba™ has developed commerce XML (cXML), a proposed standard for catalogs and purchase orders.
Accordingly, businesses wishing to conduct electronic commerce must deal with a variety of documents in a variety of formats. Further, even within a single enterprise, various business processes can be controlled by plural systems using a variety of standard and/or proprietary records and message formats. Consider the following Example, as illustrated in FIG. 1. Buyer 120 sends a Purchase Order as an EDI document to Seller 110. In order to process this request, Seller 110 in a typical scenario may need to perform several translations and transformations on this document to communicate with various systems.
The original EDI Purchase Order document may need to be converted to an equivalent XML document, with the same schema (or structure) and the same semantics (or meaning). This is a format translation between two different, but semantically equivalent document formats. The XML version may also need to be converted to from the English language to the French language, so that French partner 150 can receive a copy. This is a vocabulary translation that changes the names of fields, but not necessarily their meaning.
Also, the XML version of the Purchase Order may need to be translated into Wireless Markup Language (WML) to communicate specifics of the Purchase Order to wireless device 160, such as a PDA or a cell phone. This is both a vocabulary translation and a subsetting of the document, without a change in the semantics of the data. The XML version of the Purchase Order may also need to be transformed into a different kind of XML Purchase Order, one that is acceptable to Sellers Customer Relationship Management (CRM) system 130 for example. This could be a semantic transformation of the information in the Purchase Order in which both semantics and structure of the data may be changed.
The Purchase Order may further need to be transformed into the proprietary format of an Enterprise Resource Planning (ERP) system 140. This could involve both a semantic transformation and a format translation. In some cases, the Purchase Order may need to be converted from one version of EDI to another for party 170, EDI Version 2 into EDI Version 3 for example. This involves a simple semantic transformation between two versions that are related and mostly the same. Information in the Purchase Order also needs to be copied into the Shipment documents that will be returned back to the Buyer 120, as an EDI document in this example. This kind of transformation would tend to preserve the structure and semantics of the copied field. However the overall structure of the document might not be preserved. The roles of some fields may change, for example the “Ship-to Address” in the Purchase Order may become the “Final Destination Address” in the shipping documents, with additional intermediate shipping addresses added.
The above example illustrates some of the different translation and transformation problems facing a typical enterprise that must integrate multiple applications and interact with multiple partners. These translation and transformation problems can be automated today using so-called transformation tools. The use of such tools involves a “design” or specification phase, where the transformation to be performed is specified, and then an “execution” (also called a “runtime”) phase, where the transformations are automated based on the specification generated during the design phase. The design phase is the key, since only if the design phase is performed carefully and completed, resulting in a correct and complete transformation specification, will the transformation tool be able to perform the transformations correctly during the execution phase.
Conventionally, the design phase is largely a manual operation and can be very time-consuming, especially given the large size of the documents. It is not uncommon for an EDI document to contain thousands of fields. Moreover, a large corporation may typically use thousands to tens of thousands of different document types, with each document requiring multiple transformations. Note that EDI specifies more than 4000 different document types, and this is only one of many sources for documents. In fact, a large corporation may have a thousand or more applications, each being a source of documents. Given the large number of document types, the sheer size of many of the different types, and the number of different transformations required, it is not surprising that a key goal of modern transformation tools is to reduce the amount of manual work required during the design phase.
Current transformation tools normally do not distinguish among these different kinds of translations and transformations that are illustrated in the example above. Current tools tend to treat all transformation problems as a generalized “semantic translation” problem, and tend to ignore the relationships that may exist between the source and target documents. Since semantic transformations between unrelated documents is the hardest kind of transformation to automate, such transformation tools require a lot of human effort in order to specific the transformation.
For example, if a Change Order is being submitted, which revises some of the information in the original Purchase Order. The Change Order, generally speaking, will need to undergo the same translations and transformations that the original Purchase Order underwent. However, the schema and semantics of the information in the Change Order might be very similar to that of the Purchase Order. Further, many of the fields in the two documents are the same. However, without a formal way to leverage the relationships between documents, the specification phase is still complex. The example above is a simple example illustrating how complex the specification process for various transactions can become. In practical circumstances there are a myriad of data formats, vocabularies, and languages that must all be reconciled to achieve true collaboration.
Metadata, data that describes data, has been leveraged in an attempt to lend semantic context to digital messages. For example, the concept of the “Semantic Web” and related technologies, such as Resource Description Framework (RDF) and XML Topic Maps have recently been developed. The Semantic Web concept is directed to making information available over the World Wide Web in a form that indicates the underlying meaning of the data. These technologies provide standard syntaxes for describing metadata using a well-defined XML syntax. This tells a program how to parse the metadata, but not yet what it means. Meaning is introduced to the syntax by what can be described as “meta-metadata.” Examples are Containers in RDF Schema, Superclasses in XML Topic Maps, and Collections in DAML+OIL. This layer enables the definition of Ontologies and Vocabularies.
XML Topic Maps, RDF/RDF Schema, and DAML+OIL are general approaches for identifying subjects and resources and defining the relationships between them. One may categorize them as efforts for defining the nature and specification languages for Ontologies. All of this work is focused on defining and enabling the Semantic Web. However, the Semantic Web does not address transformations and thus conventional technologies have failed to harness metadata to the extent necessary to facilitate the creation of various transformations with reduced human intervention.
XEDI™, owned by Vitria Technology, Inc., is an example of a Vocabulary and Ontology that is specific for EDI. The XEDI™ specification is in XML DTDs with embedded concept identifiers thus conforming to the component based specification form. Contivo™ has taken an approach to transformation that appears to be based on synonym matching and a rules engine for building transformations. An example of a rule given in the Contivo™ literature is to truncate data if the target field is shorter than the source. Tibco Designer 5.0™ is described as having exposed metadata APIs to allow applications to access the metadata associated with content. Tibco™ literature describes the benefit from the metadata as allowing two applications to share the meaning of a specific content item.
The transformation of documents and data is both a common problem and an onerous problem. As the example in FIG. 1 illustrates, it is comes in many guises and may occur many times in the course of processing a straightforward data task, such as submitting a Purchase Order. The sheer frequency of occurrence, together with the size of the vocabularies and documents involved makes it a costly problem to solve. It has been estimated that over 30% of the cumulative Information Technology (IT) budget for large enterprises goes to integration of disparate applications and systems, and that 70% of these integration costs are applied to solving transformation problems. Given that it is estimated that a more than a trillion dollars is spent on IT technology annually, this makes transformation a problems whose costs are measured in hundreds of billions of dollars.
The huge cost associated with transformation can be attributed in large part to sheer size of the vocabularies and documents involved. For example, the EDI X12 vocabulary contains hundreds of different document types, of which a Purchase Order is one type of document. A typical document, such as the Purchase Order, may contain more than a thousand defined elements. Hence, an extensive vocabulary, such as EDI X12, can easily define more than a hundred thousand elements. A large enterprise typically has more than a thousand different applications, each defining their own vocabulary. While not all document types or vocabularies need to be transformed to one another, nonetheless, it is easy to see why transformation quickly becomes a large problem. Conventional technologies fail to leverage the power of metadata and established relationships between terms in documents. Accordingly, complex transformations require a great deal of manual operations in the design phase.
In the example given above with respect to FIG. 1, It would be highly advantageous if the transformation tools could make use of the fact that the Purchase Order and the Change Order are semantically and schematically similar. Ideally, if transformation specifications have already defined for the Purchase Order, then the transformation tool should be intelligent enough to re-use those same specifications in the context of the Change Order. Hence, the transformation tool should only require transformation specification for the new or changed fields in the Change Order. Today's transformation tools are generally unable to recognize that two different documents may be related and share semantically equivalent fields. Moreover today's tools generally cannot identify and re-use transformation specifications among related documents.