Phrase-formatting is a typographic technique to improve the reading experience in which the phrases in a sentence are emphasized, often by making the word spaces larger between phrases and smaller within a phrase. This asymmetric word space sizing provides visual cues in the text to aid the reader with chunking the units of meaning. Manual, semi-automated, and automated use of this technique has been demonstrated to improve reading comprehension, speed and enjoyment.
One system and method of phrase-formatting (Bever and Robbart, 2006) uses an artificial neural network with a three layer connectionist model: an input layer, a “hidden” layer, and a output layer. This artificial neural network trains on text input data, extracts patterns such as the likelihood of a phrase break, and builds a file of weights and connections for the units of the model stored in a library. The artificial neural network uses a library of punctuation and function words as starting data and analyzes text from a parser by examining a sliding window of three word sequences across the text input.
During this training analysis it learns to classify the likelihood that the second word of the three word sequence is at the end of a sentence. If it finds punctuation or an article or function word, it takes note of the first and third word and adds information to the data models in the library. Otherwise, it examines the stored data model. Next, based on the outcome of the examination of the three word sequence, the neural network assigns likelihood values that the word is the beginning or end of a phrase to the spaces between the words.
Once trained on a corpus of text, the neural network can be used to format text. After inputting the text to be formatted, the neural network is run to determine “C” values ranging from 0-3, with “3” indicating end of phrase punctuation, “2” indicating a major phrase break, “1” indicating a minor phrase break, and “0” assigned to all other breaks. Once these phrase boundaries have been established, text margins are formatted line by line in reverse line order. Next, the available space in each line is determined, then using the phrase boundary values and the available space, relative space values are assigned.
Another system and method (Bever et al., 2012), computes the informativeness of extra-lexical information (such as punctuation and spaces) adjacent to lexical items (words) to adjust character prominence. In this method, the informativeness of a space at the beginning or end of a word is proportional to the frequency of a space character relative to the frequency of non-space punctuation characters. Bever et al. (2012) also describe a second method, in which informativeness of punctuation is calculated using the predictability of punctuation after the lexical unit and the predictability of punctuation before the next lexical unit.
It would be desirable to have systems and methods for asymmetrically formatting the width of between-word spaces without: (1) determining likelihood that a word is the beginning or end of a phrase, (2) using an artificial neural network, or (3) using punctuation to determine the end of a phrase or to compute informativeness.