FIG. 1 shows an environment in which a data processing application 100 is executed so as to edit a structured document by processing documents containing structured data 102. The data processing application 100 is exemplary and can generally be described as processing structured data 102 expressed in a markup language so as to transform the structured data 102 using a solution module 104 to produce transformed information. During the process, the structured data can be presented as a rendering of a visual surface 106 (also referred to here in as a document view 106) on an output device. An editing user 108 interacts with the visual surface 106, as indicated by arrow 110, using, for instance keyboard 112, mouse device 114, or some other input device. The visual surface 106 can constitute the presentation of an electronic form having data entry fields associated with the structured data 102. In this case, the editing user 108's interaction 110 can involve the editing user 108 filling information into existing data entry fields of the electronic form, inserting and filling in new fields (as in table rows) or deleting or substituting regions of the editing surface that represent data subtrees.
The structured data 102 is a markup language. By way of example, and not by way of limitation, the markup language can be represented in Extensible Markup Language (XML). Accordingly, the structured data 102 is hereinafter referred to as an XML document 102. XML, which is documented as a W3C Standard set forth in Paoli et al., 1998, W3C recommendation, enables developers to create customized tags that describe the meaning of data, as opposed to the presentation of data.
The environment in which the data processing application 100 operates includes an Extensible Stylesheet Language Transformations (XSLT) processor that translates an XML document 102 into the visual surface 106 The visual surface 106 can also comprise another XML document, or a document expressed in a presentation-oriented markup language, such as Hypertext Markup Language (HTML). XML provides tags that represent the data contained in a document. In contrast, presentation-oriented languages, such as Hypertext Markup Language (HTML), provide tags that convey the visual appearance of a document. Accordingly, these technologies complement each other; XML allows information to be efficiently transferred and processed, while HTML allows information to be presented for display.
XSLT itself uses an XML syntax. The XSLT processor performs its translation function by making reference to one or more XSLT stylesheets. The XSLT stylesheets contain a collection of rules for mapping elements in the XML document 102 to the visual surface 106 or document view 106. To perform this function, XSLT defines its operands through XPath. XPath is a general-purpose query language for addressing and filtering the elements and text of XML documents. XPath expressions can address parts of an XML document, and can manipulate strings, numbers, and booleans, etc. In the context of the XSLT processor, XPath expressions can be used to select a portion of the XML document 102 that matches a prescribed match pattern, and then perform some translation operation on that portion using a rule provided in the XSLT stylesheets. XML, XSLT, and XPath are described at length in their governing specifications provided by the World Wide Web Consortium (W3C).
The XML document 102 is composed of XML elements, each of which includes a start tag (such as <author>), an end tag (such as </author>), and information between the two tags (which is referred to as the content of the element). An element may include name-value pairs (referred to as attributes) related by an equal sign (such as MONTH=“May”). The elements in the XML document 102 have a hierarchical relationship to each other that can be represented as a data tree 116. The elements in the data tree 116 are also commonly referred to as “nodes.” All elements are nodes, but the converse is not true. As used herein, attributes, attribute values, and text content are all nodes. A so-called XML schema (not illustrated in FIG. 1) is a particular XML language that provides a syntactic description of an XML structure. If an XML structure is an instance of the schema that it refers, it is said to be valid according to that schema.
The solution module 104 includes a data-mapping module 118. The purpose of the data-mapping module 118 is to map the structured data 102 to the visual surface/document view 106. The data-mapping module 118 can perform this task using so-called stylesheets, such as stylesheets written using XSLT. XSLT maps the structured data 102 to a format appropriate for presentation, such as HTML, Extensible Hypertext Markup Language (XHTML), etc. In other words, documents expressed in XML include tags that are particularly tailored to convey the meaning of the data in the documents. The XSLT conversion converts the XML documents into another markup language in which the tags pertain to the visual presentation of the information contained in the documents. (To facilitate discussion, the following description assumes the use of HTML to render the documents; however, other presentation-oriented markup languages can be used to render the documents.) Because HTML is a markup language, it can be conceptualized as a view tree 120 that includes a hierarchical organization of nodes, as in the case of data tree 116. The reader is referred to the World Wide Web Consortium's specifications for background information regarding XML and XSLT. Arrow 126 represents mapping of information in the data tree 116 to information in the view tree 120.
A view-mapping module 122 enables nodes in the view tree 120 to be mapped to corresponding nodes in the data tree 116. The mapping of nodes in the view tree 120 to nodes in the data tree 116 allows the solution module 104 to correlate editing operations performed on the visual surface/document view 106 with corresponding nodes in the underling structured data 102. This allows the solution module 104 to store information entered by the editing user 108 at appropriate locations within the structured data 102 during an editing session. Arrow 124 represents the mapping of information in the view tree 120 back to associated information in the data tree 116.
By way of broad overview, the mapping module 122 provides mapping between the visual surface/document view 106 and the XML document 102 by adding annotations to the view tree 120 used to render the visual surface/document view 106. These annotations serve as references which point back to specific locations in the data tree 116. FIG. 1 represents the annotation of the visual surface/document view 106 by showing an annotated HTML document 128 being output from the solution module 104.
The visual surface/document view 106 itself has an appearance that is determined by both the information contained in the XML document 102 as well as the effects of the XSLT transformation provided by the mapping module 118. Generally, in the case of electronic forms, the visual surface/document view 106 typically includes a hierarchical structure which is related to the hierarchical structure in the XML document 102. For instance, an exemplary electronic form 130 includes multiple sections pertaining to different topics that reflect the topics in the XML document 102. (However, it is not necessary to have a one-to-one direct correspondence between the organization of the XML document 102 and the organization of the visual surface/document view 106; in other words, the transformation of the XML document 102 to the visual surface/document view 106 is generally considered non-isomorphic). Each section in the exemplary electronic form 130 can include one or more data entry fields for received input from the editing user 108, such as data entry field 132. The data entry fields are also referred to herein as “editing controls.” Different graphical components can be used to implement the editing controls, including text boxes, drop-down list boxes, list boxes, option buttons (also referred to as radio buttons), check boxes, and so on. FIG. 6, to be described, provides an example of the visual appearance of an electronic form as it is being used by an editing user to enter and/or edit data via the data entry fields thereon.
Path 134 generally represents the routing of information entered via the electronic form 130 back to the XML document 102. In another words, the data entry fields in the electronic form 130 (such as data entry field 132) are associated with respective nodes in the data tree 116. Entry of information via electronic form 130 will therefore prompt the solution module 104 to route such information to appropriate storage locations in the data tree 116. Again, the linking between the electronic form 130 and the XML document 102 is provided by the mapping module 122.
The functionality provided by the solution module 104 is defined, in part, by a solution file, such as exemplary solution file 136 stored in storage 138. The solution file 136 essentially constitutes an electronic form template, providing all of the semantic information required to transform the XML document 102 into the visual surface/document view 106. Different XML documents may have been created by, or otherwise refer to, different electronic form templates. Accordingly, different XML documents may have different solution files associated therewith. Various techniques can be used to retrieve a solution file that is associated with a particular XML document. For instance, an appropriate solution file can be retrieved based on URN (Uniform Resource Name) or URL (Uniform Resource Locator) information contained in the header of an input XML document. That header information links the input document to a corresponding solution file. A storage 140 represents an archive for storing one or more XML documents created by, or otherwise associated with, respective solution files.
The data processing application 100 supports editing structures such as repeating sections and optional sections that are editing controls bound to XML data. When data is entered or deleted using one of these editing controls, the underlying XML data is correspondingly inserted or deleted. It is non-trivial to identify which hierarchy of XML nodes needs to be deleted or inserted and where they need to be inserted or deleted. Moreover, it is cumbersome to provide exhaustive information in a storage space (e.g., the solution file 136) so that that information can be used to resolve which hierarchy of XML nodes needs to be deleted or inserted, as well as where the hierarchy of XML nodes is to be inserted or deleted. In order to do so, the information being stored must contain a representation of all of the possible fragments for the hierarchy of XML nodes that can be inserted or deleted. Depending upon the complexity of the XML in document 102, the fragment representation can cause the information being stored to be quite large. A large collection of such information can result in a correspondingly large performance problem when loading that information into the data processing application 100.
Seen from another perspective, suppose the XML document 102 includes XML nodes in a structure seen in Table A:
TABLE AAB?C?D?E?FGHwhere the above notation “?” indicates an optional node, and where E is a container for F, G, and H as follows:
Suppose an optional section bound to the XML node E is to be inserted. In this case, depending on the presence of zero or more of the optional nodes B, C and D, the XML to insert could be one of the following four (4) fragments:                One fragment rooted in E with parent D        One fragment rooted in D with parent C        One fragment rooted in C with parent B        One fragment rooted in B with parent AIn general, as many separate XML fragments would be generated as the number of optional XML nodes that occur on the branch connecting the container node to the item XML node. Stated otherwise, a fragment will be generated from a corresponding item to a corresponding view side container, which may or may not be the same as the data side container, where the data side container is the XML node's parent in a corresponding XML tree. Generating all possible XML fragments, however, can be verbose if the corresponding schema for the XML document is large and/or has a high branching factor. This verbosity is due to the inability to factor the commonalities among the XML fragments and the need for a separate element for each entry. Again, the impact of this verbosity is that the performance of the user experience in editing an electronic form is poor in the presence of anisomorphic electronic form views on complex schemas for the underlying XML document 102.        
It would be an advantage in the art to remove the need to express all of the possible portions of a hierarchical markup language fragment that can be inserted or deleted when editing a structured document by processing documents containing structured data (e.g., data whose structured is described by a schema) that is expressed using the markup language. This reduced expression would in turn advantageously reduce the size of the semantic information required to transform the structured data into the rendered structured document, which would in turn advantageously improve the performance of the rendering.