The present invention relates generally to electronic document processing.
Numerous publishing systems have been developed to assist in the production of structured electronic documents. These publishing systems contain document authoring tools such as text editors which allow a publisher to add descriptive markup to an electronic document. The descriptive markup assigns meaning to various regions of an electronic document. For instance, some paragraphs may be marked as body paragraphs, while others are marked as headings. The structure of such electronic documents may or may not be hierarchical. For example, various marked regions may contain other regions, such as a section containing several sub-sections, each of which contain a heading and one or more paragraphs. These marked regions are referred to as elements, each of which has a particular type (e.g., paragraph). Because descriptive markup defines a document's structure as including a set of element types which, when taken together, typically form a tree or similar hierarchical object, the tree of element types is often referred to as the document's “structure”.
An example of a descriptive markup language for electronic documents is specified by the ISO Standard 8879: “Standard Generalized Markup Language”, or “SGML”. SGML is a markup language that uses tags to prepare structured documents. In a document prepared in accordance with SGML, an element has a begin tag and its content, and an end tag, when necessary. For example, a document may use the embedded begin and end tags <para> and </para>, respectively, where “para” is the tag name corresponding to a paragraph element, to delimit paragraphs. The content may include text and other elements.
A structured document can be associated with a rule-base which defines the legal structures that the document can have. Such a rule-base is called a document type definition (DTD). For each element type, the DTD provides a general rule which governs the content of elements of the rule type. Also provides is an attribute definition rule which specifies an attribute name, type and optional default value for a given element. Thus, the DTD describes the characteristics and properties associated with each element type, and which sub-elements are valid within any given element.
A general rule can be unrestrictive. That is, there are no restrictions on what elements of the rule type can contain. An unrestricted general rule can be written as “ANY”. A general rule can also be restrictive, specifying order and occurrence within the content of an element type. The restrictive general rule is stated in an expression language for specifying allowed patterns of sub-structures. Using the expression language, a restrictive general rule can be written as an expression with grouping operators (parenthesis), joining operators (commas for an ordered sequence and or-bars for an unordered sequence), and occurrence operators (a question mark for zero or one, an asterisk for zero or more, and a plus sign for one or more). For instance, the restrictive general rule “head, para+” requires that the content be a head element followed by one or more para elements. As another example, “(para═figure)*” is interpreted to allow any number of paragraphs and/or figures in any order.