The invention relates to correcting validation errors in structured electronic documents. A structured electronic document is an electronic document that includes content information, such as text characters, graphics, images, and the like, and structure information that defines how the content information is arranged and/or related in the document. The document structure can be specified according to a markup language format having one or more associated sets of rules that define what information is allowed in the document and in what order, so that the document can be understood by a computer. An electronic document that conforms to an associated set of rules is often said to be “well-formed” or “valid”.
A structured electronic document can include data, representing structure and content, and metadata. Metadata may identify the type of the structured document, refer to a set of rules stored separately from the document, or may itself include a list of rules. The metadata can include version data and audit trails, author data and annotations, language data, workflow routing information, asset management information, configuration information, data source information, or any other kind of data describing the document as a whole. FIG. 1 illustrates a structured electronic document with metadata 110 referring to rules stored separately from the document. FIG. 2 illustrates a structured electronic document with metadata 210 itself including a list of rules.
Structured electronic documents are often arranged according to a hierarchical structure, in which child elements of structure and content are associated with parent elements. Labeling pieces of document content is useful to allow a human or computer to understand the significance of the content. Associating child elements of structure and content with parent elements is useful to allow a human or computer to understand the relation of the child elements to each other.
The logical structure of a structured electronic document can be explicitly indicated in the document by “markup”, which consists of tags, references, delimiters and the like, that are inserted into the content of the document and that specify the logical components of the document. Markup in a structured electronic document is typically specified according to a markup language. For example, the eXtensible Markup Language (XML) is a markup language that can be used to describe hierarchically-structured electronic documents. See Extensible Markup Language (XML) 1.0 (Second Edition), T. Bray, et al. (eds.), Oct. 6, 2000 www.w3.org/TR/2000/REC-xml-20001006 (“The XML specification”). The XML specification itself defines a set of rules with which XML documents are expected to conform. Additional rules for a given document or set of documents can be specified in a SGML or XML document type definition (DTD) or XML schema.
The process of determining whether a document conforms to an associated set of rules is known as validation. For XML or SGML documents, this can include determining whether the document is “well-formed”—that is, whether the document conforms to the rules set out in the appropriate specification (and is therefore a legal markup language document)—and whether the document is “valid”, conforming to the rules specified by a document type definition or schema associated with the document. If a document fails to conform to a set of rules, it is because at least one aspect of the document fails to conform to at least one rule. A computer program validating a document may inform a user of the non-conforming aspect, and of the rule that is violated, so that the user has the necessary information to devise a correction for the document.
Electronic data are often stored in structured electronic documents to make programmatic reading, writing, and searching the data easier and more efficient, but users must be careful when editing structured electronic documents by hand to follow the rules associated with the documents. It may be easy for a computer reading a structured electronic document to alert a user when an aspect of a document's structure fails to conform to all the rules. But even if the computer specifies both the non-conforming aspect of the document and the rule which triggered the alert, the document or the rules may be too complicated for the user to understand and correct the error herself. If the user is able to understand and correct the error, it may be wastefully time-consuming for her to do so.