Systems and methods herein generally relate to automated creation of tables of contents of documents, and more particularly to methods and devices for dynamically generating tables of contents for printable or scanned content.
A table of contents is a useful part of a document, enabling many valuable features. For example, a table of contents helps outline and organize the content of the document, gives the reader a high-level view of the content in the document, etc.
Some document creation/editing applications (such as word processors, spreadsheet programs, presentation programs, graphic programs, etc.) include tools for automatically creating a table of contents. Such tools are commonly based on analyzing the electronic text representation, to determine text size, text styles, etc. However, for such tools to operate, the text must be in the form of an electronic code that represents characters. The electronic text representation can be with or without formatting.
With respect to text in electronic form, the American Standard Code for Information Interchange (ASCII) is a character-encoding scheme originally based on the English alphabet that encodes 128 specified characters—the numbers 0-9, the letters a-z and A-Z, some basic punctuation symbols, some control codes, and a blank space. In a character-encoding scheme, a series of 0's and 1's represent a character electronically. ASCII codes represent text in computers, communications equipment, and other devices that use text. Most modern character-encoding schemes are based on ASCII, though they support many additional characters.
As noted, the electronic text representation can be with or without formatting. For example, plain text is a pure sequence of character codes. Similarly, styled text, also known as rich text, is any electronic text representation containing plain text completed by information such as a language identifier, font size, color, hypertext links, etc.
However, for scanned topical items or topical items that are in print ready form (in connected component pixel format, rasterized format, etc.) such automated table of contents tools require that any text be converted into the electronic text representation (through, for example, optical character recognition (OCR) processing, etc.). Such conversion to electronic text representation is cumbersome, utilizes resources, has accuracy limitations, and looses any graphic topical items that accompany the text.