The invention relates to processing of electronic documents, and more particularly to generation of a target document from a source document. In this specification, and in document-handling terminology generally, the term xe2x80x9celementxe2x80x9d means a node or tree of nodes within a document or the full document.
The most common form of electronic document processing is the operation of a Web server to provide HTML documents to browsers via HTTP. However, various mark-up languages are also used for publication of documents, via the Web or otherwise. They generally have a hierarchical structure of elements. The structure is generally defined by tags (sequences of characters in the document).
In recent years electronic documents have been developed further. For example, Java Server Pages (JSP) contain both HTML markup content and Java programming code. Processing of such a document typically involves executing the Java code, often to generate text. The Java code is replaced by the text it generates, and the resulting HTML page is sent to the browser. In another example, the source document is a word processing template having fields for entry of data. The fields are in a fixed structure and data can only be entered at the fixed field locations.
Thus, to date the processing of source documents has been limited by the fixed locations for changing/adding content. Another limitation is that the processing is governed by the meaning of the information in the source document. For example, in JSPs the processing is governed by the Java code, and in the word processing template only dates can be inserted in date fields.
Therefore, it is an objective of the invention to provide for more versatility in which documents are processed. Another objective is that the processing does not require knowledge of the meaning or structure of the information in the source document.
According to the invention, there is provided a document processing system comprising means for processing a source document to provide a target document, characterised in that the processing means comprises means for merging the source document with at least one other source document to provide the target document.
In one embodiment, the merge means comprises means for merging source document hierarchical structure node trees into a single target tree for the target document.
In another embodiment, the merge means comprises means for identifying matching source nodes in source trees, for inserting a single node in the target tree corresponding to the matching nodes, and for inserting other nodes in the target tree with reference to said single node.
In one embodiment the merge means comprises means for always treating root source nodes as matching nodes.
In another embodiment, the merge means comprises means for treating a source tree as having a fixed role and the other source tree as having a movable role, in which the structure of the source tree having the fixed role is preserved and the structure of the source tree having the movable role may be changed.
In one embodiment, the merge means comprises means for (a) placing only one of a pair of matching nodes in the target tree, or for (b) combining the matching nodes to generate a composite node, and the selection of (a) or (b) is according to a policy.
In another embodiment, (b) is a default policy.
In one embodiment, the merge means comprises means for adding a non-matching node of a movable role tree to the target tree as a child of the node that represents its parent from the movable role tree.
In one embodiment, the merge means comprises means for placing said non-matching node after child nodes of a matching node in the fixed tree if the parent of said non-matching node is a matching node.
In another embodiment, the merge means comprises means for handling a node having more than one ancestor matching node by placing it relative to the nearest ancestor matching node.
In a further embodiment, the merge means comprises means for preserving the order of non-matching nodes of the movable role source tree unless modified by the presence of a matching node.
In one embodiment, the merge means comprises means for operating according to a lookup policy in which the movable role source tree is treated as a resource from which nodes are selectively chosen for merging.
In another embodiment, the merge means comprises means for recognising a placeholder node in the fixed role tree and for placing a set of nodes of the movable role source tree in the target tree in lieu of the placeholder node.
In one embodiment the merge means comprises means for activating merging in response to an inheriting source document indicating that it should inherit content from an inherited source document.
In another embodiment, the merge means comprises means for determining that the inheriting source document requests inheritance by reading a flag indicating such.
In a further embodiment, the merge means comprises means for reading said flag from within the inheriting document.
In a still further embodiment, the merge means comprises means for recognising a flag indicating required inheritance from a plurality of inherited documents, and for merging the inheriting and the plural inherited documents.
In one embodiment, the merge means comprises means for successively merging pairs of documents in a nested manner until all source documents have been merged.
In another embodiment, the merge means comprises means for merging partial documents.
In one embodiment, the merge means comprises means for operating according to merge instructions dynamically generated from a rule using current parameter values.
In a further embodiment, the processing means comprises means for parsing a source document to generate the source tree.
In a still further embodiment, the processing means comprises means for rendering the target tree to provide the target document.
According to another aspect, the invention provides a document processing system comprising means for processing a source document to provide a target document, characterised in that, the processing means comprises:
means for parsing a source document into a source tree comprising a hierarchical structure of nodes according to a block structure of the document;
means for merging source trees of at least two source documents to provide a target tree of a target document in which:-
matching nodes of different source trees are identified,
a single node corresponding to a pair of matching nodes is placed in the target tree,
other nodes are placed in the target tree with reference to said single node,
one source tree is treated as having a movable role and another as having a fixed role, and the order of non-matching nodes of the movable role source tree is preserved unless modified by the presence of a matching node,
the merge means comprises means for adding a non-matching node of a movable role tree to the target tree as a child of the node that represents its parent from the movable role tree,
the merge means comprises means for placing said non-matching node after child nodes of a matching node in the fixed tree if the parent of said non-matching node is a matching node, and
the merge means comprises means for handling a node having more than one ancestor matching node by placing it relative to the nearest ancestor matching node;
the processing means comprises means for rendering the target tree to provide the target document.
In another aspect, the invention provides a method of processing a source document to provide a target document, the method being carried out by a data processing system and the documents are in the electronic form, characterised in that the source document is merged with at least one other source document to provide the target document.
In one embodiment the method merges the source documents by:
parsing the source documents to generate source trees comprising hierarchical structures of nodes;
merging the source trees to provide a target tree; and
rendering the target tree to provide the target document.
In one embodiment, the merging step comprises:-
identifying matching nodes (X, Y, Z) in at least two source trees;
inserting a single node corresponding to the matching nodes in the target tree;
placing other nodes in the target tree with reference to said single node.
In one embodiment, the single node is either (a) one of the matching nodes or (b) a composite node of the matching nodes, and choice of (a) or (b) is according to a configurable policy.
In another embodiment, merging is initiated by a flag embedded in a source document indicating that it should inherit content from at least one other source document.
In one embodiment, wherein a source document is treated as having a fixed role and another source document is treated as having a movable role, in which the structure of the fixed role source tree is preserved and the structure of the movable role source tree may be changed.
In another embodiment, each of said source documents comprises a separate strand of associated content, and the method is performed to combine said strands of content in a single target document.
In one embodiment, a source document is the output of a database query, and the method merges said result with another source document.
In another embodiment, a node of said other source document is merged with multiple nodes of said query result source document.