1. Field of the Invention
This invention relates in general to transforming documents, and in particular, to a system for specifying and executing transformation rules for transforming extensible Markup Language (XML) documents into other XML documents, wherein the rule language used is XML itself.
2. Description of Related Art
Extensible Markup Language (XML) is poised to be the next big revolution for the World Wide Web (WWW). With the realization that the Web is not about just browsing any more, XML has emerged as an enabling technology to carry the Web to the next generation of electronic commerce, Web-based workflow, and integration of databases with Web applications.
XML describes a class of data objects called XML documents and partially describes the behavior of computer programs that process them. XML is a restricted form of SGML, the Standard Generalized Markup Language, defined in ISO 8879. The specification for XML can be found at the URL: http://www.w3.org/TR/REC-xml.
XML documents are made up of storage units called entities, which contain either parsed or unparsed data. Parsed data is made up of characters, some of which form character data, and some of which form markup. Markup encodes a description of the document""s storage layout and logical structure. XML provides a mechanism to impose constraints on the storage layout and logical structure.
An XML schema specifies constraints on the structures and types of elements in an XML document. The basic schema for XML is the DTD (Document Type Definition). Other XML schema definitions are also being developed, such as DCD (Document Content Definition), XSchema, etc. Information concerning DTD and DCD can be found at the URL: http://www.w3.org/.
The main difference between DTD and DCD is that DTD uses a different syntax from XML, while DCD specifies an XML schema language in XML itself. (XSchema is similar to DCD in this respect). In spite of the differences in the syntax, the goals and constraint semantics for all these XML schema languages are the same. Their commonality is that they all describe XML Schema. This means that they assume the common XML structure, and provide a description language to say how these elements are laid out and are related to each other.
There are about five basic constraints that the XML schema languages describe:
1. The attributes that an element should/may contain:
a. the types of the attribute values (mainly string types), and
b. the mandatory or optional nature of occurrences of these attributes.
2. The type and the order in which elements can be contained inside another element (the content model of the element):
a. the sub-element should of a certain name or type or that a sub-element could be of any type, and
b. a regular expression system to express how these elements occur, wherein this regular expression system can be expressed by the following operators:
i. |: A | B (either element of type A or of type B can occur),
ii. ,: A, B (element of type B follows one of type A),
iii. *: A* (zero or more occurrence of element of type A),
iv. +: A+ (one or more occurrence of element of type A),
v. ?: A? (zero or one occurrence of element of type A), and
vi. ( ): ( . . . ) (grouping of expressions in this system).
An EBNF (Extended Backus-Naur Form) for this regular expression system can be expressed as below:
re= greater than A
| re | re
| re, re
| re*
| re+
| re?
| (re)
| ANY
Note that this system includes some convenience operators. For example, A+ is the same as A, A*.
A software module called an XML processor is used to read XML documents and provide access to their content and structure. It is assumed that an XML processor is doing its work on behalf of another module, called the application. The XML specification located at the URL noted above describes the required behavior of an XML processor in terms of how it must read XML data and the information it must provide to the application.
In a typical application that uses XML for a particular specification, there would be a DTD that specifies the XML schema and one or more XML documents that satisfy that schema. The application would typically convert the XML document into an object. The application programmer would typically write several lines of code to read in the XML document based upon the schema, to get and set elements and properties based upon the schema, and to notify other parts of the application when an element in the document changes.
Because XML is a generalized extensible markup language, it has incredible potential to be the ultimate format for data description, transport, and exchange. As structured and semi-structured data flows through the various nodes of the Internet and exchanged, the data may be filled in, transformed, pruned, or composed at every stage before they get delivered, browsed, or stored. Partially filled documents may get incrementally completed as they go through various sites of a workflow or a routing system.
The same document may have different views based on its locale (e.g., one view of the date (dd/mm/yy) in Europe and another in the US (mm/dd/yy)). In electronic commerce, prices of commodities have to be displayed in different currencies, have to be computed differently for different consumers (e.g., educational consumers vs. commercial consumers of a software product), etc.
If a common language like XML is used for all these processes, sufficient dynamism is required where partial or whole XML documents can be automatically transformed with the transformed document replacing the original portion of the document in-place.
In notification systems (e.g., push technologies), a user specifies an interest profile and registers herself with the content service provider. Whenever there is content that matches the user""s interest the content provider pushes the content (possibly filtered based on the user interest specifics) to the user.
As content summaries and user profiles are specified in XML, a pattern matching/transform system will be of great use. Of course, pattern matching/transformation systems are known in the art.
For example, in U.S. Pat. No. 4,447,875, Bolton, Hagenmeier, Logsdon, and Miner describe a reduction processor for the evaluation of one or more functions which are stored in memory in a tree like graph where nodes implement a variable-free applicative language. This is a reduction processor and not a template-based pattern match or pattern match replacement system that takes advantage of the schema structure.
In U.S. Pat. No. 5,321,606, Kuruma and Yamano describe another transformation system based upon context-free grammar. In transformation from a symbol string to a term, transformation rules received describe structures of input symbol strings in the form of a context-free grammar, and include structures of output terms as arguments of terminal symbols and non-terminal symbols. An inputted symbol string is analyzed by reduction processing based on the structures of input symbol strings described in the transformation rules, and an intermediate tree is formed. A term for output is produced in accordance with the structures of output terms shown in the arguments of the terminal symbols and the non-terminal symbols corresponding to the structure of the inputted symbol string. Transformation of structured data is performed in like manner using transformation rules which describe structures of input data in terms of relations between classes of partial structures, and includes structures of output data as arguments of class identifiers.
In another example, in U.S. Pat. No. 5,530,863, Hino describes a programming language processing system for a computer language processing system, wherein a program described in a high level programming language is translated into another program written in lower level programming language. In one embodiment of the invention, a specification of a programming language incorporates a concept of handling various basic words classified by parts-of-speech including nouns, adjectives, conjunctions, and various logic words. The program described by the programming language is converted into an internal expression form based on a sentence structure which can be converted to a binary tree. In accordance with a logic synthesis rule for term-rewriting based on a pattern collation, a logic expressed by the internal expression form is subject to conversion to a lower level program description wherein the parts-of-speech are deleted.
In yet another example, in U.S. Pat. No. 4,599,691, Sakaki and Hashimoto describe a tree transformation in machine translation system. XSL (XML Style-sheet Language) is an XML based language specification for rendering XML documents. It has a core tree transformation language. This language is based upon search for an elements that qualify rather than based upon template based patterns as in our system. However, it does not take advantage of the schema structure in the syntax. Even though XSL has syntax to embed scripts for actions on pattern match, it does not integrate a programming language like Java for evaluation conditions of pattern match, conditions of variable evaluation in patterns, and conditions for replacement. There are other XML-based (or otherwise) query languages for XML being proposed that query XML structures and return parts of XML structures that qualify. These are not template based, however, and are not very powerful.
In still another example, in a publication by R. Ramesh and I. V. Ramakrishnan, entitled xe2x80x9cNon-linear Pattern Matching in Treesxe2x80x9d, Journal of the Association for Computing Machinery, Vol 39, No. 2, April 1992, 295-316, the authors describe a tree pattern match algorithm for tree structures where variables occur only at the leaf level.
Thus, there is a need in the art for techniques to provide sufficient dynamism where partial or whole XML documents can be automatically transformed. Moreover, there is a need in the art for techniques that couple such dynamism with pattern matching.
To overcome the limitations in the prior art described above, and to overcome other limitations that will become apparent upon reading and understanding the present specification, the present invention discloses a system for specifying transformation rules of XML documents into other XML documents, wherein the rule language used is XML itself. The transformation rule specifications identify one or more transformations of the document to be performed when a pattern match occurs between the document and a source pattern. The specifications are used to define class specifications for objects that perform the transformations.