Encoding documents for digital printing is conventionally done in a document or image processing device that is typically separate from the printing device. The processing device may be a personal computer or other document/image processing/generation device. The processing device, typically, has a generic print driver application that encodes and sends documents for reproduction by a particular printer connected thereto, through a communication channel or network.
The generation of standard document types is a growing trend. Such standards have been greatly encouraged and facilitated by the use of the standard extensible markup language. However, the reproduction of standard extensible markup language is not an easy task as the standard extensible markup language has been, conventionally, converted by the user into some type of format that is readily acceptable to a printing device.
Moreover, most conventional extensible markup language processing systems have been designed to handle specific processing with respect to specific extensible markup language vocabularies. Although a few conventional extensible markup language platforms have been created for the development of different processing sequences in support of different vocabularies and workflows, these conventional platforms are still fixed and static.
Representations such as extensible markup language allow the creation of vocabularies to express data and documents. These vocabularies provide a mechanism for expressing the semantics of the information along with its structure. However, to view the information, a stylesheet is needed which understands the semantics and how the information should be presented.
It is a further problem when documents are composed of parts of other documents because a compatible set of stylesheets that matches all of the vocabularies must be assembled.
Furthermore, extensible markup language allows the capture of information from full documents for people to the data of messages. Some extensible markup language vocabularies (such as scalar vector graphics) contain formatted document information. Moreover, some extensible markup language vocabularies (such as extensible stylesheet language formatting objects) contain formatting instructions. However, most extensible markup language vocabularies encode information without formatting.
In order to present the document for human consumption, formatting information must be introduced and applied. This is typically done through a stylesheet. However, it is possible to view the document without a stylesheet because a stylesheet does not exist, is unavailable, or is inappropriate for the display device. Default stylesheets are possible, but the default stylesheets typically do not provide very satisfactory renditions.
Thus, it is desirable to provide a format for which generic style sheets could be written, and into which arbitrary vocabularies could be translated. Moreover, it is desirable to convert a document to an intermediate format that represents the document's structure and for which stylesheets could be predefined. Furthermore, it is desirable to analyze a document to determine a mapping between a native vocabulary of the document and another vocabulary, thereby enabling an application of a generic document layout and style.