1. Technical Field
The present invention relates to the display of electronic documents and, in particular, to a technique for pre-computing and encoding an electronic document to minimize run-time computational requirements for parsing and search operations.
2. Related Information
For many years, the printed-paper book has been the mechanism by which individuals read books and other print media. Today, however, individuals can read books and other written publications using electronic reading devices. These reading devices have viewing screens where the user may view electronic text and graphics. These devices may be hand-held devices or may be traditional computing devices such as personal computers. Examples of such devices are the “ROCKET EBOOK” device by NuvoMedia, Inc. and the “SOFTBOOK READER” device by Softbook Press, Inc. Application programs running on these reading devices may be used to read electronic publications. These programs will soon include, for example, the “READER” brand viewing program, published by Microsoft Corporation of Redmond, Wash. The electronic publications for use with reading devices are commonly referred to electronic books or “e-books”.
E-book content is typically in the form of a markup language format, such as HyperText Markup Language (HTML) or eXtensible Markup Language (XML).
Other forms of mark-up languages used today include, for example, Standard Generalized Markup Language (SGML), eXtensible HyperText Markup Language (XHTML), and Synchronized Multimedia Integration Language (SMIL). As another example, e-books may be formatted in a general format in accordance with an Open eBook standard. This standard is set forth in Open eBook Publication Structure 1.0, which can be found at www.openebook.org. This publication is incorporated herein by reference in its entirety. These markup languages allow e-books to have added functionality over other traditional formats, including, for example, providing links to jump to another document or performing a specified function.
Although the markup language is useful and necessary for displaying the e-book on a reading device, it also has its limitations. In particular, the markup language significantly increases the computational burden for various run-time processes such as parsing. When the markup language document is displayed, tags within the document must be processed to display the document in accordance with the parameters set by the tags. Tags are generally commands written between less than (<) and greater than (>) signs. Typically, an opening tag and a closing tag surround a piece of text. Attributes may also be placed within the tag, namely between the less than and greater than signs, to provided added functionality. These tags are heavily intermixed with content. This results in substantial run-time computational work to parse markup language documents to distinguish tags from content. Moreover, these tags are non-integer variables within the markup language document. Accordingly, during run-time, these tags require additional processing to identify the function to be achieved for each tag.
The markup language also increases the computational burden for run-time search operations. For example, in markup language documents, tags may be placed between syllables within a word to provide added functionality; others are placed between words. During run-time searching, therefore, additional processing is required to determine which of these tags are word separators. As another example, markup language documents may include content that is not to be displayed. During run-time searching, it is computationally wasteful to search such content. As yet another example, the markup language document may include Uniform Resource Locator (URL) references. These references, however, must be processed at run-time to determine the location of the document identified by the URL. In the case where the URL references its own document or where the URL references a document residing locally, significant run-time processing must typically be performed to locate the referenced document.
Unfortunately, the required run-time processing to perform any of the above functions, especially for larger e-books, can be highly time-consuming. The process is likely to be unacceptably slow for larger e-book files. This slow processing is often exacerbated when coupled with a reading device having a slow processor. It is therefore desirable to provide a technique to reduce computational requirements for run-time parsing and other forms of processing.