Field of the Invention
This invention relates to document processing, and in particular, it relates to processing a page-image based document and re-targeting it for display on different devices.
Description of Related Art
Methods have been developed to convert a page-image based document, such as a PDF (portable data format) document generated by scanning a printed document, to a form suitable for reformatting and displaying on desired electronic display devices. For example, US Pat. Appl. Pub. No. 2011/0289395 describes a method that “converts a document originating in a page-image format into a form suitable for an arbitrarily sized display, by reformatting or ‘re-flowing’ of the document to fit an arbitrarily sized display device. A two-stage system analyzes, or ‘deconstructs,’ page image layout. The deconstruction includes both physical (geometric) and logical (functional) segmentation of page images. The segment that image elements may include blocks, lines, and/or words of text, and other segmented image elements. The segment that image elements are synthesized and converted into an intermediate structure. The intermediate data structure is then distilled or converted or redisplayed into any number of standard print formats.” (Abstract.) FIG. 1 of this patent shows an example of an intermediate data structure (XHTML in this case) for a page image. “Reflowing is a process that moves text elements (often words) from one text-line to another so that each line of text can be contained within given margins. Reflowing typically breaks or fills lines of text with words, and may re-justify column margins, so that the full width of a display is used and no manual ‘panning’ across the text is needed.” (Id., para. [0009].)
When presenting contents of webpages, it is common to change the style of the presentation when the content is to be displayed on different types of electronic display devices or printed out. For example, web pages are often presented differently for on-screen display vs. printing, and for display on laptop or desktop computers vs. mobile devices such as tablet computers or smart phones. In some examples, when a webpage is printed, various navigation tools and links, background images, etc. that are presented for on-screen display are not presented in the printed format. In other examples, when a webpage is displayed on a mobile device as opposed to a laptop or desktop computer, the width (e.g., number of characters per line) typically becomes narrow, positions of images are often moved (e.g. an image presented on one side of the page are often moved to the center), navigation tools are often hidden or removed, sections of text may be made collapsible/expandable into the section header, etc. These changes can be accomplished by using style sheets, such as Cascading Style Sheets (CSS). Using this approach, the document itself is not re-written; rather, different style sheets are applied to it to create different presentations for different devices.