This invention relates to computer systems and, more particularly, to information processing systems that transform data from one form to another.
Computer applications often need to use data prepared by different computer applications. For example, when a bill is paid by a company using a first accounts payable application, some of the information associated with that bill needs to be forwarded to a second general ledger application, to be recorded for the company""s financial statements. It is possible that the first and second applications were provided by two different companies and that the exact format and content of the data representing the bill differ. For example, the first application may represent an account number as two data items: a department code of three characters and an account of up to four digits; while the second application represents an account number as a single data item of up to ten digits. To allow the account data from the first application to be used by the second application, the department code and account data items from the first application can be concatenated into the single ten digit account number data item when the data associated with the bill is forwarded to the second application.
However, if the data from the first application is transferred to a third application requiring yet another format for an account number, this solution may not work. To obviate this problem, instead of modifying the first application, the billing data from the first application may be intercepted and modified by an xe2x80x9cinterceptorxe2x80x9d computer program not associated with either the first or second application. In this way, neither application need be modified; and, if either the first or the second application changes, only the interceptor program must change.
Large companies can have thousands of computer applications that must communicate data with each other. Because an interceptor program must be written for every pair of communicating applications, this could result in an unmanageable number of interceptor programs. To help alleviate this situation, general purpose software programs have been devised to transform data from one form to another. To further facilitate matters, standards have been created for various aspects of the representation of data.
Data that is exchanged between applications is broken into messages, each containing data items. In the above example, the data items associated with a bill (the amount, account number, date paid, vendor, and the like) collectively form a single message.
One can separate the content of a message into encoding and semantics. Encoding deals with the many issues associated with representing the data on the particular medium of exchange (which could be, for example, a punched card, magnetic disk, or a telephone wire). It also includes number and character representation, separation of the different data items in a single message, the mechanism for naming the data items, how characters are represented, and other issues necessary to convey the semantics of the message.
Nearly all aspects of message encoding are specified by means of standards. One of the most popular standards in this area is the Extensible Markup Language (XML) specification. This specification provides a method of encoding where data items can be named and structured into named groups.
Although data transformation software must deal with encoding issues, they are not important for this description and will not be discussed further. Only the semantic aspects of data transformation are considered.
The semantics are the meaning of the message. Semantics include the name of each of the data items, the order and structure in which the data items appear in the message, and the meaning of the values of each of the data items. The remainder of this description refers to a data item containing a single value as a xe2x80x9cfield.xe2x80x9d Fields in a message can be arranged in named groups called xe2x80x9ccontainers.xe2x80x9d For example, a person""s name can be represented as follows:
PersonName (container)
FirstName (field)
MiddleInitial (field)
LastName (field)
In this example, referring to the data item xe2x80x9cPersonNamexe2x80x9d is a reference to the entire name. When there are hundreds and maybe thousands of data items associated with a message, grouping is necessary to easily examine and manipulate parts of the message. Containers can hold fields and other containers. Fields or containers may be repeated either a fixed or variable number of times. Consider the case of a message representing a customer that includes a history of all of the orders that customer has placed. Typically, this is represented by a container that holds the information associated with each order; and that container repeats for a variable number of occurrences. When a field or container can repeat, this is called a xe2x80x9csequence.xe2x80x9d
The definition of the content of a message is called a xe2x80x9cschemaxe2x80x9d and is represented using a xe2x80x9ctree.xe2x80x9d A tree is a method of relating multiple entities that are called xe2x80x9cnodes.xe2x80x9d Each node has a single parent and may have zero or more children. The top most node, that is, the one node without a parent is called the xe2x80x9croot.xe2x80x9d If a node has both a parent and one or more children, is it called an xe2x80x9cintermediate.xe2x80x9d Finally, a node with no children is a xe2x80x9cleaf.xe2x80x9d An example of a tree is illustrated in FIG. 1. The root and intermediate nodes are containers (and possibly sequences), and the leaf nodes are fields.
For the purpose of a transformation definition, there is an input schema, which defines the input message and an output schema defining the output message. Here is an example of a simple schema:
Customer (container)
Name (field)
Address (field)
Orders (container and sequence)
PartNumber (field)
Quantity (field)
Price (field)
When referring to a field in the schema, the field name is normally qualified with its enclosing containers. In the above example, the PartNumber field would be referred to as Customer.Orders.PartNumber.
A transformation consists of the input and output schema and all of the rules that describe how to build the output message from the input message. Typically, a transformation processes a single input message producing a single output message. Transformations are capable, however, of processing one or more input messages producing one or more output messages.
There are several different types of transformations. One type is used for copying data from an input field with one name to an output field with a different name. A second type of transformation performs a function on an input field that alters it in some way and sets an output field to the result of that function. An example is converting a date from Jun. 23, 1998 to Jun. 23, 1998. A third type of transformation fabricates an output field either from multiple input fields or from a portion of a single input field. Fabricating may also include performing a function as mentioned above.
A fourth type of transformation creates a sequence of output fields based on a sequence of input fields. For example, in the message above containing a customer and the associated orders, the output message may contain a sequence of orders each of which corresponds to an order in the input message. Further, performing a filtering operation to exclude some of the orders from the output might be necessary. A filtering condition specifies the criteria for selection of input data before it is considered for further processing. An exemplary filtering operation might select all orders whose total amount was over one hundred dollars.
A fifth type of transformation alters the structure of the data. In the customer example above, suppose it was desired to have multiple messages produced from the single customer message, where there is one output message per order. Such an output message would have fields related to both the customer and the order. More complicated structural transformations which are sometimes required will be discussed below.
Typically, the details of the semantic structure of the messages associated with an application are best understood by those who use those applications on a daily basis. In general, such persons are not programmers. Current transformation software allows non-programmers to use a graphical user interface (GUI) to handle some of the simpler cases. For example, a single field may be copied from the input to the output using xe2x80x9cdrag and drop.xe2x80x9d Drag and drop is a standard GUI technique where a mouse is used to select a field, visually carry (drag) it to another field, and then release (drop) it there.
However, programming (rather than GUI) techniques have been required when more complex transformations must be implemented such as the fourth and fifth types of transformations specified above. For example, to generate such transformations requires a method of specifying what is to occur for each unique transformation. A transformation must indicate the characteristics of the input data, the characteristics of the output data, filtering conditions for selecting input data, and the steps necessary to create the output data based on the input data. For each field of output, a sequence of steps must be defined to create that field. Both the steps to create the output data and the filtering conditions on the input data are often specified using xe2x80x9cexpressions.xe2x80x9d Expressions are defined in some sort of text based programming language (usually proprietary) that requires knowledge of programming to use.
A xe2x80x9cgeneral expression treexe2x80x9d is a tree that connects one or more expressions. Each expression returns a single value and takes zero or more parameters. The general expression tree consists of a root expression whose returned value is the result of executing the tree. Each of the children of the root is another expression, whose return value is a parameters for the root. This process is repeated for as many expressions as desired.
A xe2x80x9ctree viewxe2x80x9d is a method for displaying a set of related objects that can be represented as a tree. This view is standard in modern graphical user interfaces for presenting things like files which are enclosed in folders (which may be in other folders, and so on). Generally, a tree view has mechanisms to move nodes from one tree to another tree (or another part of the same tree). Examples of the tree view are the Microsoft Windows Explorer and modern email programs that provide a way to contain mail messages in folders. Graphical presentations of tree views have methods to xe2x80x9cdrag and dropxe2x80x9d a node from one place to another. Although graphical presentations are the most prevalent at the present state of the art, textual forms may be used to describe a tree view; and manipulation of those forms may be accomplished by cut, copy, and paste (or similar) techniques that provide ease of use and simplicity for the non-programmer.
As will be understood, the concepts for creating many transformations are quite complex. The creation of transformations is, consequently, a process that would be much easier if it could be handled entirely through a graphical interface or some other easily understandable interface.
The definition of the content of a message is called a xe2x80x9cschemaxe2x80x9d and is represented using a xe2x80x9ctree.xe2x80x9d A tree is a method of relating multiple entities that are called xe2x80x9cnodes.xe2x80x9d Each node has a single parent and may have zero or more children. The top most node, that is, the one node without a parent is called the xe2x80x9croot.xe2x80x9d If a node has both a parent and one or more children, it is called an xe2x80x9cintermediate.xe2x80x9d Finally, a node with no children is a xe2x80x9cleaf.xe2x80x9d An example of a tree is illustrated in FIG. 1. Also, FIG. 2 illustrates the same tree in a form more suitable for use by a graphical user interface. The root and intermediate nodes are containers (and possibly sequences), and the leaf nodes are fields.
Object Query Language (OQL) is a specification that supports query (that is, finding data) and transformation of the results of the query. The transformation capabilities of OQL are quite powerful, allowing all of the types of transformations listed above. A complete transformation tool should be capable of any type of transformation that is expressible with OQL.
Extensible Stylesheet Languagexe2x80x94Transformation (XSLT) is another relevant standard. This standard was originally developed as part of the XSL standard for creating HTML documents from XML documents. The transformation portions of the standard were generalized and separated into XSLT. XSLT is also powerful, although the level and amount of specification required to do more complex operations is more than that of OQL.
Because these standards are all text-based and require knowledge of programming techniques, it has been either impossible or cumbersome for the prior art to adapt them to a graphical interface usable by non-programmers. There are no transformation software tools that allow all of the desired capabilities completely within a graphical user interface (or some other interface that is easily understood and manipulated) such that a non-programmer can specify the transformations without having to write computer code of some sort. Because of the increasing need for data transformation, it is important that the people that have the best understanding of the semantics of the data, but lack special training in the art of programming, be able to perform complex transformations.
It is also important to provide a simplified programming environment by which transformations can more easily be created by those who are skilled at programming.
Although, as has been mentioned, the various processes described in this specification may be implemented by a programming utilizing any of a number of different programming languages, it is quite desirable that some of the more modern languages be utilized in order to obtain the full benefit of their advantages.
It is, therefore, an object of the present invention to provide a method for executing a transformation utilizing modern object oriented programming languages.
This and other objects are accomplished in the present invention by a computer-implemented process for transforming an input message to an output message including reading transformation configuration information, reading an input message, creating output message data corresponding to each output schema node, calculating a number of reasons to continue for each schema node, performing the steps until the number of reasons to continue becomes zero, and writing the output message.