Recently, natural language processing, and more specifically, morphemic analysis, has been applied to advanced document retrieval and the like. For this purpose, large quantities of sentences must be morphemically analyzed, and hence demands have arisen for an increase in the speed of morphemic analysis. In dictionary retrieval in morphemic analysis, word data corresponding to a character string are retrieved from an enormous number of words. This operation occupies much of the processing time. In addition to dictionary retrieval in natural language processing, it takes much processing time to retrieve desired information from a database in various database retrieval operations as the database increases in size. As the information-oriented society continues to evolve, in particular, it becomes easy to acquire information. On the other hand, as databases grow excessively, it takes much time to retrieve information.
Under the circumstances, there have been increasing demands for means for retrieving information from databases (including language dictionaries) at high speed.
Approaches for realizing high-speed database retrieval can be roughly classified into two types. The first is a software approach of realizing high-speed retrieval by devising a retrieval algorithm. The second is a hardware approach of increasing the processing speed by effectively using hardware resources.
A practical example of the first approach will be described in detail below. The trie method is regarded as an effective retrieval algorithm in database retrieval. This method uses a kind of tree structure. According to the method, state transitions are made by using the respective characters of a character string as search keys, starting from the head of the string, to track data. This is a high-speed retrieval method capable of acquiring all data matching the character string from the head of the string by one scanning operation.
In order to make a transition from the root node of a tree structure to a child node corresponding to a search key, a search is made for a transition key matching the search key from a set of transition keys associated with child nodes. A transition is then made to the child node corresponding to the matched transition key. After the state transition, the next character of the character string is used as the next retrieval key to perform the same processing as described above.
This trie method is known as a high-speed retrieval method, and more specifically, an effective method for dictionary retrieval in language analysis. In practice, however, the processing speed is influenced by the manner of searching for child nodes at the respective nodes, the structure of a database, and the like. If, for example, child nodes are set at the character code positions of transition keys in the arrays of a database to increase the speed of processing for searches for child nodes at the respective nodes, a child node having a transition key matching a search key can be tracked in a short period of time. In this case, however, since an array corresponding to the size of a character code must be prepared at each node, the database becomes very large. This method is therefore not feasible.
In practice, therefore, transition keys and a set of child nodes corresponding to the transition keys are stored in advance in arrays equal in number to the child nodes, and a search for a transition key matching a search key is made in array number order or a search method such as the hash method or binary search method is used to search for a child node corresponding to the search key. The trie structure, hash method, and binary search method are disclosed in Yoshiyuki Kondo, “Algorithms and Data Structures for C Programmers” (SOFTBANK BOOKS) and the like.
A practical example of the second approach will be described in detail next. As an effective method of increasing processing speed in retrieving information from a database such as a language dictionary at high speed, a method of loading the database from the disk into the memory in advance and always retrieving information from the database held on the memory has been proposed. This method eliminates the necessity of cumbersome operation of loading the database from the disk for every retrieval, and can increase the processing speed, thus realizing high-speed retrieval.
Both the above two approaches, however, involve several problems. In the first approach, for example, the above child node search methods, i.e., the hash method and binary search method, have their own characteristics, which are not always suitable for searches for child nodes at all nodes. More specifically, when there are many child nodes, the binary search method is faster than the method of searching for a child node in array number order. It, however, takes much processing time because a transition must be repeated many times to check the match between a search key and a transition key.
In contrast to this, in the hash method, a hash value is obtained by a hash function to perform a search. This method allows a fast search when there are many child nodes. If, however, the number of keys is small, the binary search method, which is simple in terms of processing, allows a faster search than by calculating a hash function and searching a table in which hash values are arrayed.
As described above, optimal search methods at the respective nodes in a trie structure differ depending on the number of child nodes. For this reason, in order to maximize the search speed, therefore, optimal search methods are preferably selected for the respective nodes.
With regard to the second approach, if many data are registered in a database or the size of each data is large, the size of the database to be held on the memory becomes excessive. This may lead to a difficulty in holding all the data.
Furthermore, database users are in various hardware environments. Some users can prepare memories large enough to hold a large database, and demand fastest retrieval (high-speed retrieval), whereas other users cannot prepare sufficient memories and hence require fastest retrieval within a range in which the memory consumption can be minimized (memory-saving retrieval).
That is, users have different needs depending on hardware environments for database retrieval.