1. Field of the Invention.
This invention relates in general to transforming documents, and in particular, to a facility for adding dynamism to extensible Markup Language (XML) documents.
2. Description of Related Art.
Since the early days of the Internet, and specifically, the World Wide Web (WWW), HyperText Markup Language (HTML) has been used by Web servers to create marked up documents or pages for display by browsers. Subsequently, a number of different techniques have been introduced to add dynamism to HTML.
For example, when a browser requests an HTML page that has one or more embedded APPLET tags, a Java program identified by the APPLET tag is downloaded from the server and executed by the browser. Similarly, when a browser requests an HTML page that has one or more embedded SERVLET tags, a Java object identified by the SERVLET tag is executed by the server to perform one or more functions before the HTML page being downloaded to the browser. These functions typically return tagged structures that are then embedded in the HTML page at the position of the SERVLET tags.
In another example, Java Server Pages (JSPs) and Active Server Pages (ASPs) are HTML pages that include embedded programming or scripts. This embedded programming or scripts is then invoked by the server and the results are embedded in the HTML page prior to the HTML page being downloaded to the browser. In addition, Cascading Style Sheets (CSS""s) and Java scripts provide additional dynamism in the browser.
Notwithstanding the success of HTML, Extensible Markup Language (XML) is poised to be the next big revolution for the World Wide Web (WWW). With the realization that the Web is not about just browsing any more, XML has emerged as an enabling technology to carry the Web to the next generation of electronic commerce, Web-based workflow, and integration of databases with Web applications.
The XML language describes a class of data objects called XML documents and partially describes the behavior of computer programs that process them. XML is a restricted form of SGML, the Standard Generalized Markup Language, defined in ISO 8879. The specification for XML can be found at the URL: http://www.w3.org/TR/REC-xml.
Unlike HTML, XML is a balanced tag language, wherein every open tag has a corresponding closed tag and there are no semantics attached to the tag. The interpretation of the tags is left to the target environment using the document. Thus, XML separates a document description from its interpretation.
This is a big deviation from HTML where the set of tags are fixed. Thus, XML enables applications to communicate between themselves using specialized tags. Moreover, the XML documents may change between applications without ever being rendered.
An XML document has two parts: (1) the marked up document; and (2) the document schema. XML documents are made up of storage units called entities, which contain either parsed or unparsed data. Parsed data is made up of characters, some of which form character data, and some of which form markup. Markup encodes a description of the document""s storage layout and logical structure.
XML schemas specify constraints on the structures and types of elements in an XML document. The basic schema for XML is the DTD (Document Type Definition). Other XML schema definitions are also being developed, such as DCD Document Content Definition), XSchema, etc. Information concerning DTD and DCD can be found at the URL: http://www.w3.org/.
The main difference between DTD and DCD is that DTD uses a different syntax from XML, while DCD specifies an XML schema language in XML itself. (XSchema is similar to DCD in this respect). In spite of the differences in the syntax, the goals and constraint semantics for all these XML schema languages are the same. Their commonality is that they all describe XML Schema. This means that they assume the common XML structure, and provide a description language to say how these elements are laid out and are related to each other.
There are about five basic constraints that an XML schema describe:
1. The attributes that an element should/may contain:
a. the types of the attribute values (mainly string types), and
b. the mandatory or optional nature of occurrences of these attributes.
2. The type and the order in which elements can be contained inside another element (the content model of the element):
a. the sub-element should of a certain name or type or that a subelement could be of any type, and
b. a regular expression system to express how these elements occur, wherein this regular expression system can be expressed by the following operators:
i. |: A|B (either element of type A or of type B can occur),
ii. ,: A, B (element of type B follows one of type A),
iii. *: A* (zero or more occurrence of element of type A),
iv. +: A+ (One or more occurrence of element of type A),
v. ?: A? (zero or one occurrence of element of type A), and
vi. ( ): ( . . . ) (grouping of expressions in this system).
Note that this system includes some convenience operators. For example, A+ is the same as A, A*.
A software module called an XML processor is used to read XML documents and provide access to their content and structure. It is assumed that an XML processor is doing its work on behalf of another module, called the application. The XML specification located at the URL noted above describes the required behavior of an XML processor in terms of how it must read XML data and the information it must provide to the application.
In a typical application that uses XML for a particular specification, there would be a DTD that specifies the XML schema and one or more XML documents that satisfy that schema. Consider the following XML document example:
 less than ?xml version=xe2x80x9cx1.0xe2x80x9d? greater than 
 less than !DOCTYPE test SYSTEM xe2x80x9cfoo.dtdxe2x80x9d greater than 
 less than foo greater than 
 less than level1 greater than 
 less than nest1 id=xe2x80x9c2xe2x80x9d/ greater than 
 less than nest2 id=xe2x80x9c3xe2x80x9d/ greater than 
 less than /if greater than 
 less than /level1 greater than 
 less than /foo greater than 
The corresponding DTD schema for the above example XML document would be the following:
 less than !ELEMENT foo (level1) greater than 
 less than !ELEMENT level1 (nest1|nest2) greater than 
 less than !ELEMENT nest1 EMPTY greater than 
 less than !ATT LIST nest1
id CDATA #REQUIRED greater than 
 less than !ELEMENT nest2 EMPTY greater than 
 less than !ATTLIST nest2
id CDATA #REQUIRED greater than 
The DTD schema indicates that the XML document xe2x80x9cfooxe2x80x9d has one tag xe2x80x9clevel1.xe2x80x9d Within level1, there are two tags xe2x80x9cnest1xe2x80x9d and xe2x80x9cnest2,xe2x80x9d wherein both nest1 and nest2 have an attribute xe2x80x9cid,xe2x80x9d whose value is character data, and whose specification is mandatory.
As an XML document flows through the various nodes of the Internet, the XML document may be fully or partially filled in, transformed, pruned, or composed at every node. This gives rise to the notion of valid versus well-formed documents. A valid XML document is one that satisfies the restrictions of its associated DTD schema. A well-formed XML document may not have or satisfy an associated DTD schema, but it has to satisfy the syntax requirements of XML (e.g., every open tag has a matching closing tag, etc.). Often, however, it is expensive to verify the XML document against the DTD schema.
In the use of a common language like XML, sufficient dynamism is needed where XML documents can be automatically transformed, with the transformed XML document replacing the original portion of the document in-place. Such transformations may be applied to one or several parts of the document based on one or more conditions, the transformations may split one or more parts of the document into multiple parts, the transformations may invoke an arbitrary function to replace a part of the document with another, etc.
However, XML itself does not provide any mechanisms to perform such transformations, since XML documents are by themselves static in nature. There is no way to embed logic, or specify filtering, or specify transformation structures that are based upon the context in which the document is processed. Moreover, the concepts of applets and servlets as used in HTML are limited in that the APPLET and SERVLET tags occur at the leaf level of the HTML page, which means that there is no way to embed anything within an APPLET or SERVLET tag. What is needed, then, is a mechanism that seamlessly mixes programming language constructs with the constructs of XML so that one can reference and enable the other. Moreover, such mechanisms should exploit the fact that XML does not attach any specific semantics to its tags.
To overcome the limitations in the prior art described above, and to overcome other limitations that will become apparent upon reading and understanding the present specification, the present invention discloses a method for annotating XML documents with dynamic functionality. The dynamic functionality comprises invocations of Java objects. These annotations belong to a different name space, and thus a Dynamic XML-Java (DXMLJ) processor recognizes elements within the XML document that are tagged with DXMLJ prefix tags, processes each of these tags, and transforms the XML document accordingly.