The following relates to the document processing, storage, publication, and related arts.
Structured document formats are used for business documents, technical documents, and other types of documents. Document structuring structures the document respective to layout and content, and this facilitates machine reading and automated processing such as indexing, searching, clustering, classifying, and other document manipulations.
One structured document format is standard generalized markup language (SGML). The SGML format is generally considered to be a powerful but complex format. SGML is abstract, and requires a document type definition (DTD) to provide specific structuring information. DTD's have been developed at varying levels of standardization for specifying documents of a wide range of different types, for use in the automotive, aerospace, and other industries. Documents in SGML format are used for applications such as technical operator manuals for which the complexity and precision of SGML are advantageous.
Another structured document format is extensible markup language (XML). This format is generally considered to be more straightforward to utilize, and more flexible as compared with SGML. XML formatted documents are constructed in accordance with a schema specifying layout and content. Such schema may be embodied as an associated DTD, or may be expressed in a standardized XML schema such as RelaxNG, Schematron, Namespace-based Validation Dispatching Language (NVDL), or so forth. One aspect of XML flexibility is that multiple schemas can be used in the same document—for example, an XML document may be partly structured in accordance with a DTD and partly in accordance with another schema or plurality of schemas such as RelaxNG and Schematron.
The structural format including the DTD or schema specifies various constraints with which any “well-formed” document should comply. A structured document is constructed to comply with the selected structured format, and is then validated by a validation engine. The validation engine is a software module or the like which verifies that the document satisfies all constraints of the SGML or XML format including related DTD or schema constraints. In some cases, the validation may report an error specifying a document portion or aspect that fails to meet a particular constraint. A given document, such as an aircraft maintenance manual for example, may include hundreds or thousands of pages of text, drawings or images, tables, footnotes, endnotes, drawing reference numbers, and other content associated with a wide range of different content (e.g., different aircraft components and systems, different maintenance processes, and so forth). Accordingly, the validation should provide a report that identifies the location in the document at which the constraint is not met and identifies which constraint is not met. A large document may contain numerous such errors, resulting in a lengthy error report.
XML has become generally more widely used than SGML in applications such as web servers and corporate networks. XML validation engines also have more highly developed capabilities and user friendly interfaces, and can be operated in cascade to validate the XML document against two or more different schemas.
However, SGML remains in substantial use in some areas such as the aerospace industry. In view of the foregoing and other considerations, there is interest in converting SGML documents to XML. For example, the SP SGML system includes an SGML-to-XML converter program called SX, which receives as input an SGML document and outputs an equivalent XML document. The conversion performed by SX performs document validation in SGML using the DTD of the SGML document, and then converts the validated SGML document to an equivalent XML document. This approach entails use of an SGML validation engine. The output XML document is not validated at the XML level, although SX does detect and warn about certain SGML constructs which SX is unable to convert to XML.
In some environments, however, one may not have access to an SGML validation engine. Further, for some applications the objective is not to convert an SGML document to XML, but rather to validate the SGML document using available XML validation tools, but still rely on SGML authoring tools for authoring and maintenance. Existing tools such as SX are not suitable for such applications.