In a typical word processing system each paragraph exists internally as one or more strings of characters, and must be broken into lines before it can be displayed or printed. For example, the typical line-breaking algorithm has a main inner loop which adds the width of the current character to the sum of the widths of previous characters, and compares the new total to the desired line width. The program will execute this loop until the number of characters in the line exceeds the number of characters that can be fit in the line. At this point, the program can either end the line with the last full word, or hyphenate the current word and put the word portion after the hyphen at the beginning of the next line.
Two problems with this process cause it to run too slowly: first, the inner loop must be executed for every character in the line; second, if hyphenation is enabled, the context of the character that overran the margin must be deduced--that is, a determination must be made whether the character is a space, punctuation mark, or part of a word. In general, all operations that require processing of each character such as pagination and scrolling through the document are very slow. In addition, operations that depend on the interpretation of the document as a sequence of words, such as hyphenation, spell-checking and search and replace are also very slow.
U.S. Pat. No. 4,181,972 (Casey) relates to a means and methods for automatic hyphenation of words and discloses a means responsive to the length of input words, rather than characters. However the Casey patent does not store the word length obtained for future use; at the time that hyphenation is requested, the Casey method scans the entire text character-by-character. The Casey patent also does not compute breakpoints based on the whole word length. Instead, Casey teaches the use of a memory-based table of valid breakpoints between consonant/vowel combinations.
U.S. Pat. Nos. 4,092,729 (Rosenbaum et al) and 4,028,677 (Rosenbaum) relate to methods of hyphenation also based on a memory table of breakpoints. Rosenbaum '729 accomplishes hyphenation based on word length (see claim 6), but the method disclosed is different than the invention disclosed here. In Rosenbaum '729, words are assembled from characters at the time hyphenation is requested, and then compared to a dictionary containing words with breakpoints. The invention disclosed here assembles the words at the time the document is encoded, and does not use a dictionary look-up technique while linebreaks are computed.
What is required is a better method of representing the text for document processing. A natural approach for reducing the computational intensity of the composition function would be to create data structures that would enable computation a word at a time rather than a character at a time. The internal representation of the text, in this case, is a token which is defined as the pair: