Zooming versus index and contents Faced with complex material, one wants to see both detail and an overview. For the surface of our planet, Google Earth (http://earth.google.com/) perfectly exemplifies ‘zooming’, which gives smooth transition from an overview to finer and finer detail, or from a close-up to broader and broader context. One can use its search index: type in India or Bangalore and the map will slide and expand to those locations. The interface is designed, however, for a more natural and more powerful navigation. Use the mouse to slide the globe around, the scroll wheel or a slider to shrink or grow the view. One navigates by visible landmarks: go eastward past the bulk of Africa to the triangle of the sub-continent, zoom toward the visible tip, and cities start to appear. They appear first, while still too small to notice in a satellite photograph, as labels. A label, like an entry in an index or a table of contents, is text, but it is text in the same place as the item it identifies.
The usual view of a document does not allow this kind of exploration. The most widely used editor has a function called “zoom”, but while layout is clear with eight pages shrunk to one window (Drawing 1), all content is hidden. (Even with a far higher resolution display, most eyes would fail to read it.) One cannot at a glance identify the appearance of an important character, or a chapter on buttons. One cannot move swiftly past an immediately identified main part to the start of another: in Drawing 1 one must choose a page to enlarge and examine the text. In full-sized text one must scroll or leaf through, looking for the page with the actual title of the second part, or perhaps a change of page headers. (Published books too often use the book title as a page header, unchanging and thus useless within the book. Many digital documents have no headers at all.)
One sometimes has a table of contents (TOC), which in hard copy sets one leafing through for a page number. Digitally one can sometimes click for a direct jump, but the TOC remains a separated set of a few pointers, not a visibly embedded system of markers. Useful as it is when a good TOC is provided by the author, it remains a separated piece of apparatus. Quoting Microsoft Office Help, “When you click a heading in the Document Map, Microsoft Word jumps to the corresponding heading in the document, displays it at the top of the window,” not centered in its before-and-after context, “and highlights the heading in the Document Map.” It is thus an external piece of apparatus controlling what is seen in the document proper, which is shown in uncompressed detail. The TOC display permits some zooming (“You can choose the level of detail to display in the Document Map. For example, you can display all headings or only top-level headings, or show or hide detail for individual headings. You can also set the font and size of the headings in the Document Map and change the highlight color of the active heading.”). The document itself does not. The inability of the Document Map to zoom locally—for example showing all top-level headings, but the next level only for one section—is not inevitable with this approach: the similar “navigation pane” of Adobe Acrobat Reader allows local expansion. However, seeing the actual text uncompressed is a constant. One can thus compare (for instance) two paragraphs separated by more than a page of text only if one splits the window, losing a sense of what lies between them. Even in the separate TOC, the two locations are not simultaneously shown.
Indexing and Contents The functions of an Index and a TOC are quite distinct. An Index exists to help find specific items. A TOC is for general navigation. Readers complain if only an Index is present: the Index directs them well to where a key word is mentioned, but poorly to where a concept is discussed. The typical user interface, rather than exploit digital means to provide better navigation than both of these, is weaker than either. There is a need for something better than both.
For any digital document, whether on line or local, the Search function serves as a kind of Index. In one sense it is faster than an Index to use (click and arrive at a text location of the term), but it is usually necessary to type the search term, rather than to simply see it. No list is there to glance over. This means both that an American may search a UK document (not identified as British by the Web) for “color” or “windshield”, missing its mentions of “colour” or “windscreen”, and that one cannot look over a word list to get a sense of the topics discussed. The way that an Index half-substitutes for a TOC is lost. Neither bit of apparatus has ever been helpful in looking through the text itself for a remembered discussion, though each is sometimes helpful in suggesting a place to start looking. (A major motif in The Feynman Lectures on Physics is frequent use of the mathematics of two coupled oscillators, but this threads through the chapters, is not a recurring heading, and is hard to trace in the Index.)
Automatic indexing is becoming available (see for example http://www.eprecis.com), but this does not replace the TOC. For example, suppose one wishes to know how volume data such as CT scans are turned into viewable images, and one suspects that a large Medical Imaging document, currently open, discusses this. A search for ‘volume’, or an Index, will yield hits on ‘volume’ as in a book, hits on ‘volume’ of sound, hits (often in an imaging context) of the volume of blood pushed out by a heartbeat, hits on ‘large volume of data’, hits on the desired discussion, and hits on how the creation of such viewable images is used—in brain surgery, for instance—rather than on how the creation is achieved. If one knows the technical term ‘volume rendering’ for the creation, one can avoid the first four classes of unwanted hit, but not the last. Moreover, using either a connected “volume rendering” search term, or the words as a separable pair, one misses a sentence like “We must render the data as a viewable image”. Some book indices use boldface or underlining to mark the more important occurrences and/or the defining occurrence, but this is far beyond what a digital search tool can do. What we are left with as a TOC substitute is somewhat weaker than a paper Index, which for books is already inadequate.
Structural navigation To state the value of a TOC more explicitly, it outlines the structure of a document (rather than present the document as a random-access heap of paragraphs). It does not show the document itself within that view, as a planet is part of the overview from space. Such structure is often important. There are web pages that make significant use of XML, reshuffling output according to users' needs, but they remain few. This is due at times to the difficulty of true hypertext construction, but often to the fact that it is unwise. Humans are used to structured narrative, whether in a tale with events, or a logical account of a body or knowledge. An opera is poorly represented by the favourite arias from it, out of context. One of the most dramatic speeches in Macbeth occurs when a confused messenger tells Macbeth that “As I did stand my watch upon the hill, I look'd toward Birnam, and anon, methought, The wood began to move.” This is gibberish to someone who does not know of the witches, or what they said to Macbeth, or why he went to them. Starting here, his reacting with death threats and despair is incomprehensible. A hypertextual edition could link to a witch's telling him “Macbeth shall never vanquish'd be, until Great Birnam wood to high Dunsinane hill shall come against him,” and gradually the reader untangles the logic, with about as much pleasure as in reading a detective story backwards. A play or a novel or an exposition can often have a sequential structure, which the reader needs both to appreciate and to use, in finding wanted material. There is a function of ‘leafing through’ to appreciate that structure, and place material in context, that is not replaced by point search.
One traditional display of structure, beside the TOC, is ‘running heads’, at the top of a page. Even when these merely reproduce a heading from the TOC, they make exploration easier, but a TOC is not always available. The King James Version of the Bible even more helpfully points to matters within each page: “A cloud guideth the Israelites” or “Christ feedeth five thousand”. It would be hard algorithmically to reproduce such headings, via natural language processing (NLP) to decide what is happening, what is most important, and how to describe it. These are matters of judgement and interpretation. (The page from the Song of Songs with “he shall lie all night between my breasts” is headed “Christ's love to the Church”. Since neither of the nouns Christ or Church appears within the page, complex contextual reasoning was needed to arrive at this heading.) Such exegesis is beyond the current scope of computing devices, and to add it by hand to every document extant or in draft is impractical. Ideally, one would summarise the material at various levels (chapter, section, paragraph . . . ) Digital summarization is a vigorous research area, but asked to compress the paragraph above by 75%, a typical application (http://home.hccnet.nl/m.b.wieling/autosummarizer.html) produces “A hypertextual edition could link to a witch's telling him “Macbeth shall never vanquish'd be, until Great Birnam wood to high Dunsinane hill shall come against him,” and gradually the reader untangles the logic, with about as much pleasure as reading a detective story backwards. There is a function of ‘leafing through’ to appreciate that structure, and place material in context, that is not replaced by point search.” The identical result is created by the ‘AutoSummarize’ tool in Microsoft™ Word, at the same size setting.
As a summary, this is barely readable without reference to the original text. Its first sentence makes us ask “telling whom?”, since the “to Macbeth . . . he went . . . his reacting” chain of pronoun consistency following a by-name reference is lost. After previous paragraph in full, it would read strangely: it no longer mentions the value of a TOC, that paragraph's stated theme (which similarly vanishes in a 75% reduction). As a summary, this just does not work. Human glances at the original would provide better understanding, faster. Key sentences vanish for reasons of space, and vanish without trace because mere traces can never be autonomously comprehensible. The sentences that remain, however, both fail in comprehensibility for lack of context, and occupy too much space for an overview.
There is thus a need for automatic assistance in viewing documents that does not rely on unavailable automatic summarization, and that does not have self-contained comprehensibility as a goal, but rather focuses on navigation within the document. This is notably relevant where a reader needs to decide which part to read more carefully, or to edit text (perhaps as one among a group of writers), rather than read for the first time. A re-reading user knows most of what is being said, but can find it hard to track where it is said, with changes in the presentation or the flow of argument.
Moreover, contrary to the ambitions of software designers, many documents have headings and titles that are not marked according to any but a visual standard. Document scanning software often includes optical character recognition (OCR) to turn pixel images of letters into searchable strings of standard codes for letters, but optical layout recognition that classifies headers is rarer. A LATEX document usually does have entries like \section{Background of the Invention} which produce not only headers but TOC entries. Web page creators often avoid the HTML level-2 header <H2>Background of the Invention</H2> and use <br><br><big><b>Background of the Invention</b></big><br>, which directly specifies a double line break followed by four large words in bold face, on a separate line: similar ‘mere typography’ is common in “.doc” files, using only the boldface and size tools and the carriage return, without entering a named section break. This is particularly natural when creating a document with a “what you see is what you get” (WYSIWG) editor, since the user may not see a difference. It happens equally with authors who never learn that the sectioning apparatus exists, or cannot decipher the manual, and with authors whose revulsion at the visual results of a program's layout scheme makes them prefer direct control. (LATEX, used by technically minded authors who do learn such things, produces relatively clean results. Its \section{ } command is more regularly used.)
A related application (“A Method and System for Facilitating the Production of Documents”, by the same inventors, hereby incorporated by reference) teaches the utility of a ‘zoomed out’ single-page display in which it is made clear—for instance—where a moved paragraph was, versus where the paragraph is now. To construct such a view is the aim of the present invention. The detection, presentation and management of changes, in some embodiments using a view constructed as here disclosed, is the scope of its companion.
An analogy is of interest here, between DNA as recorded and used in a genome database, and as used in and by the living cell. One may roughly compare a base in DNA to a letter, a triplet coding for an amino acid to a word, a gene (the sequence of triplets specify a protein) to a paragraph, and the rest to whitespace and mark-up. A DNA molecule coils up to make a thicker strand, which in turns coils, and supercoils, many times until the large ‘chromosome’ structure is created: only a small part can uncoil at a time, given the constraints of fitting into the cell. In genome software the single most frequent action is to seek an exact or acceptable match for a given base or triplet sequence. The cell, not studying itself, has less reason for such searches. To activate a gene it uncoils a part of the chromosome, un-sub-coils a section of this, un-sub-sub-coils a subsection of this, etc.; the chromosome locally ‘puffs out’ until the gene itself uncoils and can be ‘read’ into transfer RNA and used in protein synthesis. The cell does not scan the whole chromosome each time. It efficiently navigates the hierarchy of coils, to find the paragraph it must now use, rather than to find instances of a particular sequence. Human readers of digitally stored text navigate less smoothly, ‘puffing out’ into a browser window far more material than they currently wish to read.
There is a need to make it easy to see what is being discussed in different parts of a text, and to zoom in and out on what is being said. Smooth and easy zooming is critical, in enabling a human reader to decide (more quickly than by skimming the full text) which parts are to be read more carefully in a literature search, a patent search, or any other task which involves a large body of documents to be quickly digested with respect to their relevance to a matter in hand. Moreover, if the user is accessing them over the Web via limited bandwidth, beginning with a zoomed-out view of text can greatly reduce waiting time. More detail should be called up as needed, and can be loaded in background and available instantly when the user has decided from the overview to ask for it.
As much of a document as possible should appear in the limited viewport a computer offers, whilst a) making structure apparent at each detail level; and b) enabling navigation between higher and lower levels of detail, globally or partially. The user should descend smoothly from a high-level view to a detailed one (as in Google Earth) and back: not point at a map to call up a separate view of a street scene. However, a document is a sequential or 1D entity, not a 2D surface, and recognizing a chapter on the Law of Least Action is very different from recognizing a high-level view of Africa. The problems and opportunities are thus very different. The present invention seeks to meet them.