1. Field of the Invention
This invention relates in general to printing systems, and more particularly to a method and apparatus for printing XML directly using a formatting template.
2. Description of Related Art
Extensible Markup Language (XML) is a text-based markup language that is designed to make information self-describing. XML is designed to improve the functionality of the Web by providing more flexible and adaptable information identification. It is called extensible because it is not a fixed format like HTML (a single, predefined markup language). Instead, XML is actually a “metalanguage”, i.e., a language for describing other languages, which lets you design your own customized markup languages for limitless different types of documents. XML can do this because it's written in Standard Generalized Markup Language (SGML), the international standard metalanguage for text markup systems (ISO 8879).
XML is fast becoming the standard for data interchange on the Web. Indeed, since XML was completed in early 1998 by the World Wide Web Consortium (W3C), the standard has spread like wildfire through science and into industries. XML is intended to make it easy and straightforward to use SGML on the Web, e.g., easy to define document types, easy to author and manage SGML-defined documents, and easy to transmit and share them across the Web. XML defines an extremely simple dialect of SGML which is completely described in the XML Specification. The goal is to enable XML to be processed in the way that is now possible with HTML. For this reason, XML has been designed for ease of implementation.
As with HTML, you identify data using tags (identifiers enclosed in angle brackets, e.g., < . . . >). Collectively, the tags are known as “markup”. But unlike HTML, XML tags tell you what the data means, rather than how to display it. Where an HTML tag says something like “display this data in bold font” (<b> . . . </b>), an XML tag acts like a field name in your program. It puts a label on a piece of data that identifies it (for example: <message> . . . </message>).
In the same way field names for a data structure are defined, programmers are free to use any XML tags that make sense for a given application. Naturally, though, for multiple applications to use the same XML data, the tag names intended to be used have to be agreed upon.
A structured document formed of predetermined elements, such as a document described by XML is structured only by content and structure. The definition information regarding a style for the document, such as format and attribute information required for display or printed, is defined and administered independently of the document content information.
To display such a structured document on a displaying apparatus, or to print the document on a printing apparatus, information about the structure of the document (“structure information”) is first analyzed and separately defined information about style definition (“style definition information”) is obtained. The style definition information depends on the analyzed structure and, once obtained, is set as display or print attribute information.
The result of such a structure analysis may be represented in a tree structure. The independently defined style definition information is often defined in relation to a set of identifiers (hereinafter referred to as “tags”) that indicate the elements of the document structure in terms of the various units of the document.
Authors and providers can design their own document types using XML, instead of only using HTML. Document types can be explicitly tailored to an audience, so the cumbersome fudging that has to take place with HTML can become a thing of the past. Thus, authors and designers are free to invent their own markup elements. Moreover, information content can be richer and easier to use because the descriptive and hypertext linking abilities of XML are much greater than those of HTML. XML can provide more and better facilities for presentation using stylesheets such as CSS and XSL.
In HTML, default styling is built into browsers because the tagset of HTML is predefined and hardwired into browsers. In XML, where you can define your own tagset, browsers cannot know what names are going to be used and what they will mean, so a stylesheet is needed if the formatted text is to be displayed. For example, browsers which read XML will accept and use a CSS stylesheet at a minimum, but you can also use the more powerful XSLT stylesheet language to transform your XML into HTML.
The Cascading Stylesheet Specification (CSS) provides a simple syntax for assigning styles to elements, and has been implemented in most browsers. The Extensible Stylesheet Language (XSL) has been created for use specifically with XML. XSL uses XML syntax (an XSL stylesheet is an XML file) and has widespread support from several major vendors, although current browser support is limited. XSL comes in two flavors. XSL is a pure formatting language and needs a text formatter like Formatting Objects Processor (FOP) or PassiveTeX to create printable output (both can produce PDF). XSLT (T for Transformation), is a language to specify transformations of XML into HTML either inside the browser or at the server before transmission. It can also specify transformations from one vocabulary of XML to another, and from XML to plaintext.
Style sheets originated in publishing and document management applications. However XML applications go behind traditional document management and are useful for these applications as well. The SGML approach was to separate the document from its presentation. A document can be published in different forms on any media. For example, there's the hardcover edition, the pocket edition and the CD-ROM edition. In fact, with SGML, documents are “retargetable”: For example, the same document can be published automatically on different media including paper and electronic media. The operative word is automatically. To achieve this goal, SGML (and XML) encode high-level semantic information. For example, XML markup would identify the title, the paragraphs and the keywords in a document. The markup is specifically not concerned with whether the title is in Garamond or AvantGarde font. The font, the size and the color are properties of a published document on a given medium. More importantly, these properties can be automatically deduced from the high-level, semantic markup. Therefore software can automatically prepare documents for publishing. When printing, the title may be typeset in AvantGarde and the paragraphs in Garamond. Keywords require no special formatting but are compiled in an index. When publishing on the web, the title may be a graphic in its own frame and the list of keywords may be an index with hyperlinks.
Specific instructions on how to prepare the document for certain media are collected in stylesheets. Different stylesheets for print and for the web are used. Different stylesheets may even be used for the hardcover and the pocket editions.
An XSL stylesheet is a set of rules where each rule specifies how to format certain elements in the document. The stylesheets have rules for title, paragraphs and keywords. With XSL, these rules are powerful enough not only to format the document but also to reorganize it, e.g. by moving the title to the front page or extracting the list of keywords. This can lead to exciting applications of XSL outside the realm of traditional publishing. For example, XSL can be used to convert documents between the company-specific markup and a standard one.
Style sheets are, as discussed above, separated from documents. Therefore one document can have more than one stylesheet and, conversely, one stylesheet can be shared amongst several documents. The ability to associate several stylesheets to a single document means that the same document can be rendered differently depending on the media. The ability to share a stylesheet between several documents necessitates enforcing a corporate style.
As the Web became more commercial, publishers wanted the same control over quality of output that they had with the printed medium. This gradually led to an increasing use of concrete presentation controls such as explicit fonts and absolute positioning of material on the page. The unfortunate but entirely predictable side effect was that it became increasingly difficult to deliver the same content to alternative devices such as digital TV sets and Wireless Application Protocol (WAP) phones. Until now, in order to control printing, content providers used stylesheets, as described above, to control the rendering, e.g. fonts, colors, leading, margins, typefaces, and other aspects of style, of a Web document without compromising its structure. To print XML data is to apply a stylesheet, such as XSL, to the data using an XSLT processor. This processor outputs formatted objects, which are then input to a composer that generates final-form pages. These pages are then converted into a page description language (PDL) such as PostScript, Portable Document Format (PDF), or Advanced Function Presentation (AFP). Nevertheless, stylesheets are cumbersome and do not allow a user to print XML data efficiently and quickly. In a production system, for example, print speeds exceed 1000 pages/minute. Transforming XML data with an XSL stylesheet is processing-intensive and does not support such print speeds.
It can be seen then that there is a need for a method and apparatus for printing XML directly using a formatting template.