1. Field of the Invention
The present invention is directed to an automated method of preparing content with design in a presentation that is suitable for printing and/or electronic publishing.
2. Description of the Related Art
The reference to any prior art in this specification is not, and should not be taken as, an acknowledgment or any form of suggestion that the prior art forms part of the common general knowledge.
Most document production is achieved without using any kind of structure or automation. In order to improve the efficiency of the document production process varying degrees of automation are available. Current electronic typesetting and document layout and publishing systems for printing and/or electronic publishing offer automation features that utilize different types of data to produce a completed work. The content is generally produced separately from the design or stylistic content which gives the finished work a particular appearance. This appearance may be common to a group of works across a series, lending the series a consistent format that is often designed to appeal to potential purchasers.
The creator of the content, hereinafter called the content creator, writes the text of the work. A particular work may also require the production of other material such as drawings and other graphical figures. These may be created or prepared by the content creator, who may be a content creator preparing the written content, or by another content creator such as a technical illustrator or an artist. The raw text and other material are hereinafter termed the content and are not necessarily formatted for the final appearance of the work.
The stylistic appearance is generally controlled by a graphic, document or Web designer. The designer is charged with the task of creating an aesthetically pleasing or efficient design that may be intended either for print or for electronic publishing in page form or in some other geometric space. We will henceforth refer to the output of the design as a partial page, a full page or a series of pages, although it may include other display spaces such as computer monitors or other display devices.
The designer typically prepares sample pages and/or produces written guidelines which dictate the finished appearance of the work. The sample pages and guidelines may be created using a known desktop publishing software package such as Adobe PageMaker, Adobe InDesign or QuarkXPress, Web page content creation software, or recorded using a word-processing system or other data-processing system. The stylistic information is hereinafter called the design.
Once the design has been approved, and the content has been completed, both are sent to an operator who prepares the presentation of the work by manually combining the content with the design and layout rules specified in the design. The process is a manually-intensive one, with scope for error and misunderstanding. A typical work such as a reference book containing several hundred pages may have a fairly complex layout including sidebars, drawings, photographs, graphs and tables, and may take an operator from several weeks to several months to prepare manually.
The process is very subjective, and even by using a number of positioning rules which define how the positions of certain objects interrelate, it is possible that two different operators working independently on the same material would produce two very different results.
On completion of this process the work is typically published in printed or electronic form by a publisher. The publisher may be a commercial publisher, a society, a corporation, an individual, or any other disseminator of the work.
The content and the design information created in this process are typically stored in a computer-readable file or files, a data stream, one or more database records, hereinafter called data sets, and in all instances may include structural tagging such as is present in XML, SGML, HTML and other tagging specifications.
The design generally includes several different parts that provide structure to the published work:
Paragraph styles: These are applied to paragraphs within the content and specify information such as the fonts and font sizes to be applied to various elements within the work including the main body text, section headings, sidebar headers, sidebar text, captions, running headers and lists. Type specifications may also be detailed separately from the paragraph styles and include rules to provide stylistic control to the typesetter such as the use of hyphens within the final document. Paragraph styles deal primarily with the format of the paragraph. They do not generally provide any guidance on the relative or absolute positioning of paragraphs, although a style may provide some control over the number of lines allowed to exist in isolation from the rest of the paragraph when a paragraph is forced to break into two or more parts. The latter is known within the art as “keep” options, or “widow/orphan” control. A paragraph style may also define relationships to the preceding or subsequent paragraph, or to specify whether the entire paragraph must appear on a single page.
Master pages: These are document specifications that are used as the template for a defined display area such as pages within a work. For example, in printed works master pages typically include elements whose positions and characteristics rarely if ever change, allowing these pages to be predefined. These pages may include background graphics used on part title pages, running heads and footers used on the main text pages, background shading behind page margins and placeholders for things such as page numbers and chapter titles. Many publication designs specify multiple master pages for different display styles.
Elements: These are items that change in terms of both position and content. They are defined by the designer, and may be illustrated with sample text and images (in the case of Figures, for example), and they may have associated positioning rules such as “always place at the top of the display area”. Elements include logos, advertisements, menus, sidebars, tables, figures and other items relevant to the work's purpose and design.
Document DTDs (document type definition) and schemas are definitions of the structural tags that may be used to describe a particular type of content. DTDs and schemas generally are expressed as a sequential or nested series of structural entities that are then applied to the content. For example, a schema with a heading entity allows the content to be described as a heading entity. DTDs and schemas are typically derived in a manual analytical process or derived from other DTDs and schemas. They relate to both the content and the structural design of that content, but the act of preparing a DTD or schema is usually carried out as a separate process to that of preparing the design and the content.
Some content may be created and stored in a database system. This content may derive from secondary databases or may be entered directly into the database. This type of content typically includes information related to commercial products such as product descriptions and specifications. Some database systems are able to apply stylistic tags to the content in the database and/or publish that data in a structured fashion. Some of these database publishing systems include the ability to express dynamic data from multiple databases.
Related to database publishing systems is a class of automated software used primarily for report generation and transactional documents such as invoices, insurance documents and prospectuses. These systems are primarily focused on high-speed large volume data processing and have always been limited in their graphical sophistication. They are not suitable for high-quality commercial publishing applications.
Some document automation systems utilize software that augments the function of page layout applications such as QuarkXPress, Adobe PageMaker, or Adobe InDesign to allow them to function as database publishing systems. These systems are limited to highly structured data and have limited ability to deal with any variability in data length or appearance.
Several desktop publishing packages also offer automatic alignment features that can move an element, for example, to the top or bottom of a page, or maintain its position with a specific reference point in the content. These systems have limited capacity to resolve complex conflicting positional requirements. Many desktop publishing packages also offer scripting or other programmatic systems which allow a certain amount of control over the layout process to be exercised by a suitably skilled programmer. This functionality provides a method for developing a semi-automated or even a fully-automated layout system. However, there are drawbacks to these systems. A full layout can only be achieved if the intended result is relatively basic or with very significant programmatic development. More complex layouts can be achieved by skilled programmers, but each new design typically requires extensive additional development to accommodate features that are unique to that design. These limitations often render the scripting or programmatic method to be uneconomical when compared to the manual process, which it is intended to replace. Typically an automated template is only developed for books or documents whose basic design will be used in many titles, such as in a series of works, where the total. title count will number in the dozens or hundreds of examples.
Attempts have been made to fully automate the typesetting and/or layout process. These include the development of typesetting software systems such as TeX, Penta and Advent 3B2. These systems provide extensive programmatic support for defining automated templates. However, creating a template for a book that will be commercially-attractive can take up to several months of intensive development. These systems may also provide solutions where templates are rigidly defined using numerical constraints either defined using a series of often lengthy parametric dialog boxes within the user interface, or by some other expression of these parameters via a control file, or via a programmatic interface. There have been attempts to include a graphical user interface in these systems, but the complexity of the code structure makes the interaction with the interface limited in its functionality. The time and cost involved in developing a new specification for a complex template imbues the systems with a level of complexity that makes them inaccessible to the general graphic design and publishing market. The templates that are created for these systems are generally economically non-viable for one-off publications such as a unique book format with a short print run, and are difficult to adjust to the requirements of highly variable content. These systems are also very expensive, both in terms of software and the cost of training operators, and provide no significant cross-media functionality.
A related class of automated publishing system was created to deal primarily with office documents and some technical documentation. Examples include Interleaf and Adobe FrameMaker. These systems have been focused more on document management and production than on sophisticated graphic design presentation.
To support the requirements of complex technical document production a structural tagging system called SGML was developed. Some publishing systems began to support SGML often in a limited fashion and generally with disappointing results. The major drawback of SGML is its enormous complexity and the extent of the structures it tries to encode. A very small subset of SGML called HTML was then developed to enable publishing over the Internet. HTML is extremely limited in its capacity to support the structural requirements of complex documents and visual structures.
In an attempt to combine the benefits of SGML while reducing the complexity XML was developed by a committee of the W3C. XML and its derivatives have begun to drive a range of publishing systems but do not yet feature the graphical complexity and ease of use that would ensure its robust acceptance in the market. Typically XML publishing is driven by programmers rather than graphic designers. Future attempts to improve the interface to XML publishing systems are likely to increase its use. One direction being taken in XML publishing is via XSL-FO, a method of defining the appearance of structural elements within a defined space such as a printed page. XSL-FO is not yet able to provide the graphical sophistication required by most professional print and online publishers.
In summary, the current state of the art is defined by manual desktop publishing systems, semi-automated desktop publishing systems, and fully automated publishing systems. The content and the design information are both stored in data sets which may be a computer-readable file or files, a data stream, one or more database records, and in all instances may include XML or other tagging. In general these systems align along two axis: the simpler to use systems offer the highest and most flexible presentation sophistication but offer the lowest degree of automation; the most complex systems offer a high degree of automation traded off against a lower level of presentation sophistication.