This invention relates to the fields of computer systems and data processing. More particularly, methods and apparatus are provided for converting a document from a fixed-layout format into a portable non-fixed layout format, with high fidelity.
Many office and presentation document software programs are designed to save documents in a manner that exactly preserves their page layout, size, font and positioning information, using a fixed layout. One example is the Portable Document Format (PDF) offered by Adobe Systems Incorporated.
Fixed-layout formats contrast with non-fixed layout formats, such as HTML (HyperText Markup Language), that do not preserve spacing, size, font and layout properties across the various programs and browsers that are used to display documents having such a format. A common example of a document that does not have a fixed layout is a typical webpage, which may appear visually different across different web browsers, operating systems and mobile devices, while containing all of the original semantic information.
Whereas a fixed-layout document may retain a great deal of data to allow a program displaying the document to adjust many characteristics of the document in order to present the document with the desired appearance, a web page typically does not. Although this may help reduce the size of the webpage, and therefore allow it to be transmitted faster, the appearance of the webpage when presented will depend on the web browser program that present it, the platform (e.g., a smart phone, a portable computer) and/or operating system of the platform.
Viewing or manipulating fixed-layout documents often requires installation of propriety software, which is not as widely or freely available as software that works with non-fixed layout documents. The requirement that readers of a fixed-layout document have special software makes it more difficult to share the document, because not everyone with whom the document should be shared may have the software. This can significantly limit the distribution of the document. In contrast, software for viewing a non-fixed layout document, such as a web browser for viewing HTML files, may be installed on just about every computer, tablet and smart phone that has Internet access.
Several attempts have been made to increase the portability of documents and, in particular, to make fixed-layout documents accessible via a browser.
One attempted solution involves the use of a browser plug-in, such as Adobe® Flash®, to render an original document (e.g., a Microsoft® Word document) in the desired format (e.g., HTML) by taking advantage of features not widely available in the output format. These solutions are generally not available on all computing and communication platforms. For example, many mobile telephones are limited to using standard HTML, or cannot operate the necessary browser plug-in for some other reason. Further, browser plug-ins often perform poorly with software designed for HTML, such as search engine spiders or screen readers for the visually impaired.
A second solution is to render the original document as a series of images. However, the resulting images will normally result in the loss of all semantic content. Thus, a viewer of the resulting images will not be able to search for or copy any textual content that was in the original document. In addition, the images often do not scale well to small or large sizes. For example, if the output image is relatively small and is stretched to appear larger, undesired visual artifacts may appear, text and objects may not look smooth and the overall aesthetic appeal may suffer. Further, a set of images representing the original document may occupy a lot of storage space, which can slow transmission and loading.
A third solution involves abandoning the original font and page layout information, and instead rendering only the most semantically relevant information, usually the text. Although the semantic content is retained, all aesthetics are lost, usually making the result visually unappealing. Such output will usually be unsuitable for advertisements, for documents used in a presentation and/or elsewhere.