Composition programs such as word processing, desktop publishing, and typesetting or layout programs are used to create, edit, store, and output textual documents on a variety of digital computer applications. These computers include, but are not limited to, large mainframes connected to terminals, desktop or laptop personal computers, and handheld communication or digital devices. One of the functions of a composition program is to determine the position, organization, and arrangement of text for printing, display on a computer screen, or electronic storage. Documents often have a characteristic design, characterized by a combination of positive space and negative space. Positive space refers to the areas where ink, text, or objects are placed; negative space refers to the white areas of a page without ink, text, or objects, including areas between and surrounding text and objects. For a primarily text-based document, three factors that affect the appearance of positive space are: the size of the text on each line, the space between lines (leading), and the choice of typeface. Other factors that affect the appearance of text include the use of negative space around text blocks (areas where text is not placed, such as inter-paragraph space, paragraph indents, and document margins) and the use of textual or graphic elements to create visual points of interest, such as the use of a different typeface or different size for a chapter number or a chapter title.
Desktop publishing programs are used to create most modern printed materials, for example, books, magazines, and newspapers. These software programs such as Adobe InDesign® or QuarkXPress® use binary files, a computer file that stores data encoded in a binary form. Binary files can contain just text, or can include text with other data, including images. When they contain only text data and are without other data, they are called plain text files.
Desktop publishing software programs such as Adobe InDesign® or QuarkXPress® allow designers to create positive spaces that are uniform in appearance, and use standard typesetting techniques such as hyphenation, justification, etc. to achieve an even density of text at across a document, a page, or a paragraph. For print publishing, documents such as books, considerable effort goes into designing the appearance of documents to match the stylistic preferences of the publisher. This includes choices in the size and shape of pages, the size and shape of the “live area,” (which can include text blocks, images, and other data) within the page, the typeface, type size and leading of text within text blocks or between text blocks. It can also include typographic design choices such as how typographic details such as hyphens, dashes, and ellipses can be used. Books of a specific imprint, by a specific author, or a specific series may be designed to have a particular appearance so that they match in certain design choices allowing a book series, for example, to be recognizable as belonging together, but still be easily differentiated from other series. In short, while books and other documents are essentially the same generic form at a functional level—text on a page—much effort is put into customizing or individualizing details within the documents and this is why books and magazines have such a variety of appearance in printed form.
With the prevalence of computers and the use of desktop publishing software to create printed documents, the contents of books are readily available in digital form. Electronic books (e-books) make use of this readily available digital content and many printed works are available in electronic form on web or for viewing on a specific device, such a personal computer, handheld device, or smartphone. E-books preserve the informational content (i.e., the plain text) of the printed form, but the details of the printed form, metadata such as typeface, type size, leading, indents, etc. are removed or altered to a more generic form, such as plain text. Modern typesetting software programs have data models that allow text content to be differentiated from design details, and this feature facilitates extracting the text content for use in other electronic forms, such as e-books. The downside to this approach is that design information and other metadata is not preserved and the content appears ‘generic’ and identical in appearance to other e-books regardless of the design considerations that were incorporated into the source document used for print publication.
E-books that use generic text store the text without metadata or with minimal metadata in a text or binary file. A software application reads the content from the file and composes the text for display on the device. Because the text or binary file that stores the text content has no or little metadata, the software application itself does the computations to determine the layout (arrangement of text on screen) and typographic details such as the typeface, the type size, size of negative space such as margins, etc. In this “Dumb File and Smart Application” approach to e-books, the appearance of the electronic version of the book is generated on the fly by the application from generic text content. A “Dumb File and Smart Application” approach allows for user-controlled typographic details (e.g., type size); however, because composition is rendered on the fly, display of text on the screen is limited by the processing speed of the device and the size of the document being rendered. On the fly rendering increases the latency of navigating through an e-book, which gives the user the experience of waiting for the screen to render as they read. Another limitation of on the fly rendering in the “Dumb File and Smart Application” approach is that text composition is limited to the availability of the fonts (digital typefaces) to the Smart Application or to the operating system of the device itself. This limitation may cause the “look and feel” (“design”) of “Dumb File and Smart Application” e-books to differ substantially from the design of their print book sources.
The use of digital typefaces in printed books is a critical aspect of the appearance of the printed material, including the main text itself, but also other functional text elements that serve an ornamental function such as the book title, chapter titles, drop capitols, etc. Different typefaces connote different emotional responses and meanings in addition to the primary, literal meaning of the textual content itself. Book and other document designers customize the “feel” of a book by the selection of typefaces and where and how they are used within the document. This allows designers to impart subtle, but tangible meaning and is used creatively to set the tone and differentiate different genre of content.
Binary application files are the computer files used by desktop publishing applications such as Adobe InDesign® or QuarkXPress® to digitally design, compose, and output the exact appearance of the final printed material, such as a book. One method, the “Dumb File and Smart Application” approach, illustrated in FIG. 1, extracts text from the binary application files in a form free of metadata so that the plain text content can be used for display on multiple devices such as a Sony Reader or an Apple iPhone®. A rendering of the file that stores the data content is shown in 110. This file could be a desktop publishing application file or an output of a smart file such as a PDF (described below). This binary file is used to design the appearance of the document and arrange it for print either directly or via a PDF file. The content is extracted to a dumb file 120 that is “dumb” in that it contains only the textual content and has little or no metadata on the appearance of the text for presentation, e.g. a text (.txt) file. Because the file is “dumb,” a smart application 130 on the e-book device such as an iPhone® or other handheld computer is required to compute where on the device display text from the dumb file will appear and how it will be rendered (typeface, size, etc). The rendered result 140 may differ substantially in appearance from the source (WYSIWYG) binary desktop publishing file 110 because the metadata present in the source file is absent in the dumb file 120.
PDF or other smart file viewer applications typically have several generic properties: (A) generic files: viewers can view any PDF file of any size, (B) generic display: viewers can display PDF files on any size screen, with options to scroll, zoom in or out to view the contents, and view multiple documents at the same time; document size and shape are typically discrepant from display size and shape, and (C) generic platform: PDF viewers function with the same user interface across multiple devices and multiple operating systems. Some PDF viewers allow the user to edit or search the PDF content. Critically, the viewer must be multi-purpose and generic enough to handle a variety of documents in a variety of conditions, and there is no tailoring of the user-interface, display functions, or navigation functions for the device, document, or content.
In current print production, desktop publishing programs output instructions to printers in a “smart” file format called Portable Document Format (PDF), which represents the printing instructions for composition of the text and other content in a format that is independent of the desktop publishing software, computer hardware, and operating system. A smart file contains the text content (just like a plain text file) and information about the appearance (typographic details the typeface, type size, kerning, character scaling, etc.) and position (exact location of lines, words, letters, etc.) of text and other content. A smart file predetermines the location and appearance of content.
A PDF file represents text content as text elements, which specifies the position on a page where characters should be drawn. Characters are specified using a font resource, a description of a digital typeface, and are either unembedded or embedded. Unembedded font resources rely on the host computer system to encode the digital typeface, whereas embedded fonts are encoded within the PDF itself at the cost of increasing the file size. The benefit of embedded fonts is that they allow viewing and printing of the PDF on computers and devices other than the one used to create the document, but still allow the fonts to be used and viewed. PDF is widely used as a cross-platform method to view documents destined for print on a computer display exactly as they would appear in final printed output because of its “what you see is what you get” (WYSIWYG) properties. Thus, the example of a page from a book shown in 110 illustrates a rendering of a WYSIWIG desktop publishing program binary file, a PDF that encodes these WYSIWIG properties for both computer display and printing, and also actual printed output.