The present invention relates to the storage and retrieval of data defining the points at which a word can properly be hyphenated. More specifically, the invention relates to data structures that include hyphenation points.
Data defining hyphenation points are conventionally included in dictionaries and similar printed lists of words. In a conventional full text dictionary, each acceptable hyphenation point of each defined word is indicated by a mark such as a hyphen, a dot, or an accent, and the entry of a defined word may show the hyphenation points of inflected forms of that word in a similar manner. A user can retrieve the hyphenation points of a word by finding the entry showing that word's hyphenation points. Since the entries of a dictionary are conventionally alphabetized according to the word defined, the user will usually find the entry based on the spelling of the word whose hyphenation points are sought.
Rosenbaum et al., U.S. Pat. No. 4,092,729, describe a hyphenation technique that uses a storage dictionary. As described at col. 2, lines 8-44 and col. 4, lines 12-16, the technique calculates a vector magnitude and angle for a given word, and the magnitude serves as an address in the dictionary at which the angle for the word is stored if the word is correctly spelled. As shown in Table 1, the technique combines a hyphenation byte with each angle representation in the dictionary so that hyphenation points for a correctly spelled word can be retrieved based on magnitude and angle. Herzik et al., U.S. Pat. No. 4,456,969, describe a hyphenation system that uses the technique of Rosenbaum et al. to hyphenate a multi-lingual document, with a dictionary being available for each language as shown and described in relation to FIG. 9 and at col. 4, lines 4-21. Similarly, Carlgren et al., U.S. Pat. No. 4,574,363, describe an enhanced hyphenation function that uses the technique of Rosenbaum et al. for words in the dictionary, but uses an algorithmic search for hyphen breaks for words not in the dictionary, as shown and described in relation to FIG. 5.
Rosenbaum, U.S. Pat. No. 4,028,677, describes a previous hyphenation technique that uses a digital reference hyphenation matrix (DRHM) approach, as shown and described in relation to FIG. 2. As described at col. 2, line 33-col. 3, line 26, the DRHM contains a representation of all legal hyphenations of words that might be anticipated, based on defining the hyphen as a valid character. The alpha word vector representation technique shown in Table 1 is applied to a hyphenation dictionary. No matter where the hyphen resides in a word, the word's magnitude remains unchanged, but the word's angle changes uniquely based on the location of the hyphen, so that all hyphenation possibilities for a given word are stored in one row of memory by using the magnitudes as an address and storing all the corresponding angles at that address. In order to hyphenate an input word a hyphen is added to the word and its magnitude is calculated. Memory is accessed at an address equal to the magnitude, and if that address is not found, the word cannot be legally hyphenated. If the address is found, the corresponding angles representing legal hyphenations of the word, are compared with test words generated by sequentially inserting hyphens in the input word. Hyphenated versions of the input word resulting in equal compares are gated to the output line.
Damerau U.S. Pat. No. 3,537,076, describes an early scheme for automatic hyphenation that does not involve storing hyphenation points. The letters of a word to be analyzed are converted into a reduced character set. This set is then broken up in various ways into all of its possible syllable combinations, and out of these possible syllabifications the most probable syllable pattern is selected. At col. 1, lines 47-72, Damerau discusses disadvantages with the use of a dictionary in which all possible words are stored for look-up to find the word and its proper hyphenation pattern. At col. 4, line 1-9 and col. 5, lines 1-23, Damerau also describes the use of a small dictionary of most frequent errors that is checked before the statistical procedure is initiated.
Casey, U.S. Pat. No. 4,181,972, describes a similar approach to hyphenation in which most words are hyphenated logically based on vowel and consonant patterns. An exception table is also employed for situations that do not fit the logical hyphenation approach.
Dolby et al., U.S. Pat. No. 3,439,341, describe another early hyphenation machine in which a series of letters in a word are analyzed according to the positions of vowels and consonants, as well as letter groups that indicate split points for hyphenation.
It is also known to store a list of words compactly. Tague et al., WO-A 85/01814, describe a data compression technique that uses a dictionary containing a list of words. As described at page 3, lines 11-23 and at page 11, lines 3-32, the dictionary is compressed by storing the words in alphabetical order and taking advantage of the redundancy in characters that results. If two entries begin with the same letters, the second character can be stored with one character representing the number of letters common to both entries followed by the remaining characters not common to both entries.