Because of the increased availability and use of computers and improved methods of communication between them, it has become common to use non-paper media for transmitting and storing documents. Such media include magnetic and optical disks, tapes and other storage systems. Documents developed and transmitted in such form (hereinafter called electronic documents) are often also viewed on computer display devices and need to be rendered, or displayed, on a computer screen or other output device, in a readable, or formatted form. These systems have become popular for and are particularly useful with very large documents which may be used by many people. Such documents include large system, manuals, engineering designs, and the like.
Many currently available computer systems format and display electronic documents, such as word processors having "what-you-see-is-what-you-get" (WYSIWYG) displays, hypertext systems, and desktop publishing systems. These systems permit one view, or display, of a document at a time. However, currently available systems include formatting specifications in the internal, electronic representation of a document and require reformatting of the whole document if a different format, such as hiding or emphasizing different portions of the text, is desired. Thus, state-of-the-art display processors are not used to their fullest capabilities.
Moreover, most current systems, specifically information retrieval systems, consider text as a stream of graphic display instructions rather than as a hierarchy of various types of objects which have formatting properties which may be changed. Without the ability to change formatting properties of a document, the document is less useful. For example, the document may not be transferrable between different types of computer systems. Furthermore, even those systems which allow changes to formatting properties of a document require time proportional to the document length for reformatting. Although this amount of time may be acceptable for small documents, such delays become objectionable during the display of very large documents.
Electronic documents are often developed and viewed with systems having tools for assisting navigation within the document. Such tools include full text indexing and retrieval (i.e. searching) engines, and, particularly for large documents, tables of contents similar to those for printed books.
Full text indexing and retrieval engines normally index every word found in a document and record the number of occurrences of a word and its location(s) within the document. However, most current systems only identify the total number of occurrences of a word at one level, or division of a document. For example, a system may record the total for a book, or the total for each paragraph in a book. Some systems, however, report totals for a few selected levels within a document, but not cumulative totals over all levels of a document. other systems report whether a word occurs in one level of a document, such as a paragraph (by indicating "yes" or "no"), and cumulate the number of paragraphs in which the word occurs rather than the number of occurrences of the word. These systems fail to take full advantage of more advanced document structures to enable a user to find relevant portions of a document.
It is also common to use a thesaurus, Boolean logic, and context-based retrieval mechanisms along with such indexing and retrieval engines. However, engines with such mechanisms do little to improve the determination-of the relevance of portions of a document if separated from document structure. Moreover, such additional searching procedures, especially those which incorporate a thesaurus, require additional setup and time which may be objectionable to a user.
Tables of contents are also used to assist navigation of a document in current systems; however these systems lack more advanced structures which further assist a user in finding relevant portions of a document.
As described above, current systems have failed to provide the fullest capability for a user to navigate readily an electronic document and to manipulate such a document on a variety of output devices in an efficient manner. This failure is due primarily to the conception of text formatting as a sequence of formatting instructions, and to the representation of an electronic document resulting from such a conception. For example, in current systems, format specifications are normally integrated with a document to create a document containing a sequence of display instructions. These format specifications also normally include pagination. However, with electronic and other systems which do not depend on paper, pagination is neither necessary nor desirable. Such systems fail to separate the text content from the text form.
Accordingly, it is an object of the present invention to provide a data processing system and method which permits simultaneously displaying multiple views of various portions of an electronic document, each having its own (possibly distinct) format specification.
It is another object of the present invention to provide a data processing system and method of rendering documents which treats text in a manner separate from formatting properties.
It is a further object of the present invention to provide a data processing system and method for rendering an electronic document which allows changes to the specified format of the document and displays the document with the changed format from a selected viewing location immediately without reformatting the whole document.
It is another object of the present invention to provide a data processing system and method for indexing electronic documents which reports, for selected words, the number of occurrences of that word within each section and subsection of the document.
It is another object of the present invention to provide a data processing system and method for enhancing the ability of a user to determine the relevant portions of a document.
It is another object of the present invention to provide a data processing system and method for generating a representation of an electronic document which enables immediate display and formatting of the document for multiple views, improved determination of relevant portions of the document, simple selection of portions of the document for viewing, and the attachment of private and public annotations.