The invention relates to a method, a system and apparatus for the direct execution of XML documents.
XML-basics
Extensible markup language (XML) is a subset of the standard generalized markup language (SGML) format. In short, XML allows a developer to define the format of documents structured by means of so-called tags. While XML defines both a physical and a logical structure of documents, one embodiment of the present invention focuses on the logical structure of an XML document. Abstracting from the physical structure, XML allows for the creation of an XML document and, optionally, an associated document type definition (DTD). A DTD spells out structural rules for an XML document. An XML document is “valid” with respect to the DTD if the XML document obeys the rules of that specific DTD.
The logical aspects of a DTD define elements and attributes. Each document of such a type contains only instances of the defined elements, and the attributes and the composition of the element instances comply with the DTD's specification. As an example, a DTD T defining an element A, with a composition of subordinated elements X, Y, and Z and attributes b and c of element A might be noted in XML as follows:<!ELEMENT A (X, Y, Z)><!ATTLIST A b#PCDATAc#PCDATA>where #PCDATA defines the kind of data admissible for attributes b and c. A XML document that is valid with respect to the DTD T described above may contain instances of element A. An instance of the element A consists of the start-tag <A . . . > and an end-tag </A>. Within the start-tag, the values of the attributes are given, and between the start and the end tag, instances of the composing elements are given. A document of type T may thus contain instances of A as follows:<A b=“string 1”c=“string 2”><X . . . > . . . </X><Y . . . > . . . </Y><Z . . . > . . . </Z></A>where the composition of X, Y, and Z instances is given by corresponding element definitions in the DTD T.
The power of XML stems partly from the fact that an element's composition can be recursive. For illustration, some composition possibilities expand on the example above. The already used sequence (X, Y, Z) indicates that an X element is followed by a Y element, which in turn is followed by a Z element. Another important composition operator is the choice operator, |, e.g., (X|Y|Z) indicates a choice between an X, Y, or Z element. Other operators include the + and * operators, indicating repetitions of a component. However, they are not used in the present examples.
Assuming that Y and Z have no composition, i.e., in XML notation, (<!ELEMENT Y >, <!ELEMENT Z>), the definition of X completes the DTD T for the element A. The definition of X uses the choice composition and recursion to element A: <!ELEMENT X (Y|A)>. Thus, X is either a Y or an A. Possible documents of type A are:<A b=“. . . ”c=“. . . ”><Y></Y><Y></Y><Z></Z></A><A b=“. . . ”c=“. . . ”><Y></Y><Y></Y><Y></Y><Z></Z></A><A b=“. . . ”c=“. . . ”><Y></Y><Y></Y><Y></Y><Y></Y><Z></Z></A><A b=“. . . ”c=“. . . ”><Y></Y><Y></Y><Y></Y><Y></Y><Y></Y><Z></Z></A>and further repetitions of <Y></Y>. For simplicity, the examples did not use special composition operators, such as +and *, for repetition. As a shortcut <Y> </Y> can be replaced by <Y/>.
XML is the outcome of a long standardization process. Developers widely use and accept XML. Many important XML implementations fall into one of two categories: data exchange and document publishing. For data exchange applications, XML can define data formats for exchanging complex data between two programs. Furthermore XML can provide an exchange format for data residing in relational database systems. Examples for such formats are XML-data from Microsoft, meta content framework (MCF) from Netscape, and resource description framework (RDF) from the world wide web consortium (W3C).
For document publishing applications, XML is a data language for markup of all kind of documents. XML markup indicates the logical content of a document. Associated generic layout languages, such as cascading style sheets (CSS) and Extensible Stylesheet Language (XSL), provide layout information generically for the DTDs. The output of the generated layout may be hypertext transfer protocol (HTTP), standard page description language (SPDL), or Postscript, respectively.
XML Processing
Currently existing software and applications are using XML for the representation of data and documents, which are then processed by general purpose programming languages. Existing proposals for the processing of XML-documents typically involve event-driven or tree-manipulating techniques. The choice of which processing technique to use is independent of whether the application lies in the data-exchange or document-publishing domain. In an event-driven approach, the document is processed in strict sequence. Each element in the data stream is considered an event trigger, which may precipitate some special action on the part of the application. The simple application program interface (API) for XML (SAX) can implement an event-driven approach in an existing programming language.
The tree approach provides access to the entire document by parsing a document according to a structure tree. Basically, the elements of an XML-document are the nodes of the tree, and the components of each element are the siblings of the node. The commonly used API to access such a tree is the document object model (DOM) programming interface specification. A DOM uses standard syntax to describe a document as a series of objects. Programming languages such as JavaScript, VBScript, C++, and Java can then access the DOM for a document, obtain a particular object, and manipulate the object.
However, a program that uses an API to access a XML document is typically a complex, general-purpose program. Such a program is typically complex because it takes into account system and platform dependencies, and reliability and security concerns. Several XML tools try to hide this complexity by generating special code for data exchange or document publishing applications. Although these tools may help to easily generate a program pattern in a traditional programming language, they do not abstract completely from using explicit programming to navigate through the document that has to be processed. Thus, there is a call for a system that reduces the need for complex, general purpose programming that results from using APIs to access a XML document.
XML versus Traditional Programming
Traditional structured as well as object oriented programming methods use unstructured data sources. These programming methods use unstructured data in the sense that the structure of the data is not given by means of production rules. In contrast, the element definitions in XML or Backus Naur Form (BNF) rules define the structure of programs. Unstructured data sources are, in the case of traditional programming, used in connection with a structured, yet independent, program. On the other hand, object oriented programming methods combine the structured segmented code with data objects, but still the object space (data) is not structured in the above-mentioned sense.
It would, of course, be possible to use structured data models in connection with these methods. However, even if the data models have structure, the program structure is still independent of the data structure and therefore needs complex treatment as described above. In fact, the current situation in processing XML-documents implies two independent structures, one of the XML-document and one of the program processing it. This approach suffers at least two disadvantages: (a) If a programmer intends to reliably process the complete structure of any instance of a DTD, such code may fail due to the variety of corresponding XML-documents and the complexity of such tasks and (b) the program processing the XML-document is typically designed to process all documents of a certain DTD. In other words, the inherent generic nature of the code increases the complexity of the program.
Consequently, it is an object of the invention to create a method, system and apparatus to improve the processing of XML-documents by incorporating data and software in an integrated structure. Incorporating data and software in an integrated structure reduces programming complexity and reduces the need for fundamental programming knowledge to achieve data processing and to reuse the existing structure of the XML-document.
Another object of the invention is to provide a new and general method, system and apparatus to generate such integrated data and software structures for XML-documents. Yet another object of the invention is to provide a novel method, system and apparatus to directly execute standard document structures, i.e., XML-documents.