PDF is a file format which is widely used for document including text as well as graphics. PDF files are not easily rendered in Web browsers which often do not support direct display of PDF file content. While the content of PDF files can be readily viewed using publically available viewers, such viewers are normally stand alone applications which need to be executed outside of a web browser making viewing of PDF documents using a Web browser a difficult experience. One reason for the failure of many browsers to directly support PDF documents is that the processing required to render such images makes for a somewhat unsatisfactory experience when PDF documents are to be retrieved and viewed on a mobile device in many cases.
Scalable Vector Graphics (SVG) is an Extensible Markup Language (XML)-based vector image format for two-dimensional graphics with supports interactivity and animation. The SVG specification is standard developed by the World Wide Web Consortium (W3C) with the expectation that it will be used for Web browsers and viewing of content via a Web browser.
Various open source publicly available utilities have been developed and are available for converting PDF documents to SVG documents. One such utility is pdftocairo. The utility pdftocairo converts PDF files into SVG markup that can be rendered by a browser. The SVG which pdftocairo creates includes two main parts: 1) a definition of symbols, each symbol definition describing the individual line drawing commands required to draw a particular symbol, e.g., letter, in a particular font, style, and size, and 2) the document content information which references these symbols to draw shapes which appear as lines of text on the page. Because pdftocairo relies on line drawings for generating SVG content, the SVG content is often larger in terms of file size than might be the case if text was recognized in the PDF file and then the text converted to a SVG file using text fonts.
While identifying text represented by a PDF file and then converting the text into an SVG file might seem like a practical approach it requires the knowledge of text content in the PDF file or the ability to reliably recognize text in the PDF file. While some PDF files include text information others represent the text using drawing information making it difficult to recognize.
In view of the above discussion, it should be appreciate that it would be desirable to be able to convert PDF documents including text in a reliable manner from the PDF format to an SVG format without the need to identify text in the PDF document but, at the same time avoid some of the disadvantages of the readily available conversion utilities which generate SVG files using primarily line drawings without the use of fonts which can result in a large file that can be difficult to render on a device due to the need to render the SVG files as drawings.