This invention relates to document processing systems, and in particular, to computer-implemented methods for updating breakpoints in a paragraph following a change to the paragraph.
When we type text into a word processor, we take it for granted that at some point, when the insertion point becomes too close to the right margin, the line will break and any additional text we type will appear on the following line. This typically results in one or more paragraphs, each of which is broken into one or more lines defined by breakpoints marking the beginning and ending of each line. One common problem that arises in word processors and other document processing systems is determining where in the paragraph to place these breakpoints.
Conventional line breaking algorithms, often referred to as xe2x80x9cgreedy algorithms,xe2x80x9d seek to pack as much text as possible into a line without having the line exceed the maximum line length. Because of their relative simplicity, these algorithms have the advantage of being easy to implement and quick to execute. They are therefore particularly adapted to real-time editing with conventional interactive word processors, in which prompt response to user input is essential.
A disadvantage of such greedy algorithms, however, is that they optimize only one line at a time. These algorithms are generally incapable of considering how the position of a breakpoint on a particular line may affect the lengths of all other lines in the document (hence the term xe2x80x9cgreedyxe2x80x9d). For example, it may be the case that by placing slightly less text on one line, many other lines can be made to more closely approach the desired line length. A greedy algorithm, because it optimizes only the current line, does not recognize this. As a result, the breakpoints selected by a greedy algorithm can result in paragraphs having a decidedly unattractive appearance, particularly when the paragraphs are to be set in narrow columns. For example, the paragraph may have lines that deviate significantly from the desired line lengths. In some cases, the last line of a paragraph may have only a single word. In the case of justified paragraphs, certain lines may have excessively large gaps between words.
Because of its ability to store the entire paragraph in memory, a computer need not be confined to considering only one line at a time. The presence of the entire paragraph in memory can, in principle, enable the computer to examine the entire paragraph before committing to a particular set of linebreaks.
As a result, the computer should, in principle, be able to consider the effect of a line break on the overall appearance of the paragraph. In some cases, a computer might refrain from packing as many words as possible into a line in order to generate a more aesthetically pleasing paragraph. For example, a short preposition might easily fit on the first line of a paragraph. However, this might result in the last line of the paragraph having only one word.
This ability to consider the entire paragraph at once before committing to any linebreaks gives rise to a second class of algorithms in which breakpoints are selected on the basis of a global parameter that measures the effect of the entire set of breakpoints on the paragraph as a whole. The leading algorithm of this type, which is referred to hereafter as the KP algorithm, is described in Knuth and Plass, xe2x80x9cBreaking Paragraphs into Lines,xe2x80x9d Software-Practice and Experience, 11:1119-1184 (1981), the teachings of which are herein incorporated by reference.
Throughout the specification, it will be necessary to refer to directions and locations within a paragraph. In keeping with the preferred direction for reading and writing English text, a paragraph is considered to begin at its left-most end and to end at its right-most end. The upstream direction is the direction towards the beginning of the paragraph; and the downstream direction is the direction towards the end of the paragraph. The adjectives xe2x80x9cearlierxe2x80x9d and xe2x80x9claterxe2x80x9d are used in the specification to refer to the locations that are upstream or downstream from other locations respectively. The method of the invention, however, does not depend on these definitions and can be applied to a paragraph that begins at its rightmost end and ends at its leftmost end.
Referring to FIG. 1, the KP algorithm considers a paragraph 10 to have a set of legal breakpoints 12, each of which has a cost 14 associated with it. Some of these legal breakpoints are xe2x80x9cfeasible-breakpoints.xe2x80x9d A feasible breakpoint is a legal breakpoint that results in a line having a length that is within a predefined tolerance of a target line length. From this set of legal breakpoints 12, the KP algorithm selects a first set of feasible breakpoints 16 for the first line, as shown in FIG. 2.
Each feasible breakpoint in this set of feasible breakpoints for the first line generates a set of feasible breakpoints for the second line. For example, if the first line breaks at b1, the second line can break at either b3 or b4. A break earlier than b3 will result in a second line that is too short relative to a desired line length, whereas a break later than b4 will result in a second line that is too long relative to the desired line length. If, on the other hand, the first line breaks at b2 instead of at b1, breaking the second line at b3 results in a second line that is too short. A second line break following b4 results in a second line that is too long. Hence, the only feasible break for the second line is at b4.
It is readily apparent that the above procedure continues until the end of the paragraph, with the feasible breakpoints for any line being determined by the selected feasible breakpoint for the immediately preceding line. Each feasible breakpoint for the first line thus generates a finite set of feasible breakpoint sequences for all subsequent lines. By adding the costs associated with each feasible breakpoint in a sequence of breakpoints, one can obtain a cumulative cost, for each of these feasible breakpoint sequences. The KP algorithm is an efficient way to select the feasible breakpoint sequence having the lowest such cost.
Although the KP algorithm is efficient, it is apparent that if one were to change a paragraph, for example by inserting or deleting text, the algorithm would be forced to regenerate all the feasible breakpoint sequences in the resulting changed paragraph. As a result, application of the KP algorithm to real-time editing, as it is commonly performed in modern word processors, results in undesirable delays caused by the need to re-evaluate all possible breaks in the changed paragraph following each insertion or deletion.
It is thus desirable in the art to provide a globally optimizing linebreaking algorithm that meets the stringent performance standards associated with real-time editing in a modern word processor.
The method of the invention presupposes that a particular paragraph has been operated upon by a dynamic programming line breaking algorithm such as the KP algorithm described above. As shown in FIGS. 3 and 4, the operation of the KP algorithm generates a network of connecting arcs 18 from one feasible breakpoint 16 to the next, as well as costs associated with each arc. This network of connecting arcs, and their associated costs, will be collectively referred to as xe2x80x9cauxiliary information.xe2x80x9d
Although the ordinary meaning of a xe2x80x9cparagraphxe2x80x9d is that of a distinct division of a written work, usually marked by beginning on a new and indented line, as used herein, a xe2x80x9cparagraphxe2x80x9d refers to an ordered sequence of items which is to be divided into an ordered set of subsequences called xe2x80x9clines.xe2x80x9d Hence, by this definition, an entire book can be considered as a single paragraph. The items themselves need not be alphanumeric characters. For example, the items might be frames in a comic strip or musical symbols in a composition. The method of the invention is sufficiently general to apply to any ordered sequence of items that is to be divided into a similarly ordered set of subsequences of such items.
When a user makes a change to a paragraph, the change can permeate the entire network of arcs. Breakpoints which were once feasible can be rendered unfeasible; previously unfeasible breakpoints can be made feasible; and the cost associated with a path between two feasible breakpoints can change. These changes can, in turn, precipitate the need to make wholesale changes to the distribution of linebreaks throughout the paragraph. Under these circumstances, it may be necessary to run the KP algorithm on the entire paragraph in order to update the auxiliary information and the distribution of linebreaks in the paragraph.
However, in many cases, a change to the paragraph results in highly localized changes to the distribution of linebreaks. For example, the insertion of one or two words to a paragraph can result in linebreaking changes to one or two lines following the insertion point but no changes anywhere else in the paragraph. The insertion or deletion of a punctuation mark may involve no changes at all to the distribution of linebreaks. Under these circumstances, large portions of the auxiliary information previously computed for the paragraph may still be valid and should therefore not have to be recomputed.
The method of the invention takes advantage of the fact that many changes to a paragraph require linebreaking changes only in the immediate neighborhood of the change. To do so, the method of the invention provides for caching auxiliary information that represents the underlying graph structure of a paragraph. This auxiliary information includes information on feasible breakpoints of the original paragraph as well as information concerning costs associated with different paths from one feasible breakpoint to another.
Following a change to the paragraph, the method of the invention identifies a changed section and an unchanged section of the underlying graph associated with the resulting changed paragraph. The method then processes only the changed section, thereby generating changed section information. This changed section information can include a set of changed feasible breakpoints corresponding to the changed section of the paragraph. The method then selects, from the auxiliary information, a selected portion associated with the unchanged section of the paragraph. This selected portion can include a reusable set of feasible breakpoints associated with the unchanged section of the paragraph. From the selected auxiliary information and the set of changed section information, the method obtains the optimal break for the changed paragraph.
By reusing previously obtained auxiliary information concerning the underlying graph structure of the original paragraph, the method of the invention provides an efficient way to incrementally update the linebreaks of a paragraph in a globally optimized manner.