In the field of data entry devices such as devices that use handwriting and speech recognition and other data entry techniques, there is a need to store extensive volumes of data to assist in recognition, disambiguation or word selection and processing. In the world of mobile computing and mobile communications, memory space is very limited or is expensive and there is a need to minimize the space occupied by such data.
In the field of data entry, arrangements are known (for example as described in U.S. patent application Ser. No. 08/754,453 of Balakrishnan, filed on Nov. 21, 1996, assigned to the assignee of the present invention and incorporated herein by reference) in which a reduced keyboard or keypad is used for character entry where each key ambiguously represents more than one character and disambiguation software is used to disambiguate a key entry to identify the probable intended key from the various ambiguous possibilities. In such a scheme, dictionary, word or n-gram data is necessary to perform the disambiguation. Large amounts of data are required to enable satisfactory disambiguation.
Data compression techniques exist for purposes such as bulk data storage. An example is gzip compression, which is suitable for compression of alphabetical text, and is explained here by way of background. The Roman alphabet comprises 26 letters a through z, which can readily be represented as a byte of eight bits of data. Eight bits of data allow one bit for a start-of-word indicator and 128 characters (2.sup.7). Accordingly, such a scheme has 102 unused byte values (unused in the sense of being unnecessary for coding of 26 characters). In Gzip compression, the additional 102 byte values are used to encode character pairs. By way of example if it assumed that bits 0-25 are used for a through z, value 26 can be assigned to mean "ba", value 27 to mean "ca" etc., using 102 character pairs selected as the most common character pairs in the language in question (e.g. American English).
There is a need for an improved method of storage of dictionary or other data suitable for data entry disambiguation.