1. Technical Field
This invention relates to finite state dictionaries, and particularly, though not exclusively, to electronic linguistic dictionaries for use in data processing.
2. Description of the Related Art
Finite state processing is the technology which dominates in the field of linguistics products related to dictionary look-up.
Finite state processing in application to natural and artificial (computer) languages processing appeared almost fifty years ago. Twenty years ago finite state processing had its rebirth in applications to natural language processing; during the last decade it has become by and large an industry standard for dictionary look-up. The main efforts in this field were concentrated on designing more and more complicated finite state nets for solving of specific problems, and on reducing the number of states in these nets to overcome the main inherited problem of the finite state processing approach—the prohibitively large amount of required memory. The gain in speed provided by this approach per se, combined with the steady increase of computer performance was sufficient for spell checking, hyphenation, and other linguistic applications, typical for word processors. Optimization of finite state nets for speed was considered only at the macro level of the topology of the nets.
Finite state processing involves computer representation of a ‘net’ made up of nodes and links between these nodes, also known as states and transitions. Current known dictionary tools use some predefined fixed format for representation of nodes and links. The oldest and most well-known of such methods is called TRIE structure (taken from the term ‘reTRIEving’). This method provides fast run-time access, but requires considerable memory and therefore is not typically used for processing of natural languages.
The rise of text data mining and knowledge management, which to some extent was instigated by the pervasive spread of Internet/intranet technologies, makes new demands on the speed of text processing; these applications require high speed for tokenization, producing morphological identification, lemmatisation, and key word extraction.
A need therefore exists for a finite state dictionary and method of production thereof wherein the abovementioned disadvantage(s) may be alleviated.