In speech recognition or language analysis, a tree structure dictionary containing phonemes or phoneme sequences as nodes is used to quickly search for the information of a word. For example, in speech recognition, features are extracted from input speech, and the output probability of an acoustic model forming each word is obtained in accordance with the words and acoustic models written in a recognition dictionary. A search method such as Viterbi search is then used to search for the likelihood of each state of a word or each of phonemes constituting the word, thereby performing speech recognition. U.S. Pat. No. 6,507,815 discloses a technique of decreasing the likelihood calculation count for a portion which can be shared by words by using a tree structure dictionary in order to decrease the computation amount of likelihood calculation for each phoneme of a word in each time interval of input speech.
FIG. 9 shows an example of a conventional tree structure dictionary used for speech recognition. Referring to FIG. 9, reference numeral 901 denotes a node formed by a phoneme or phoneme sequence of a word. Such nodes are shared by a plurality of words to form a tree structure. Reference numeral 902 denotes a link for establishing a parent-child relationship between nodes; 903, a node number serving as the identifier of the node; and 904, data stored in a tree structure dictionary. When the tree structure is traced from the node at the top (root node) to a node at the bottom (leaf node), the phoneme sequence contained in the traced nodes represents a word, and the data 904 of the word can be acquired from the leaf node.
Likelihood calculation in speech recognition basically needs to be performed for each phoneme in each node. In the case of a tree structure dictionary, however, as described above, since a node is shared by a plurality of words, likelihood calculation for a phoneme in the shared node can also be shared. For example, referring to FIG. 9, likelihood calculation for “k” with node number 2 can be shared by four words “kawano”, “kimura”, “kimoto”, and “kijima”. For this reason, likelihood calculation needs to be performed only once, and hence high-speed operation can be realized unlike the case without any tree structure dictionary, in which likelihood calculation must be performed four times when the likelihood of each word is to be calculated.
In order to form a tree structure dictionary, each node 901 needs to have the information of each child node to which a transition is made from the node. FIG. 10 is a view showing the node information which each node of the tree structure dictionary has, and more specifically, showing the node information of a node 14 and that of a node 17 in FIG. 9 as typical examples. As shown in FIG. 10, node information includes the information of “phoneme count”, “phoneme”, “child node count”, “child node number”, “data count”, and “data number”. In this case, “phoneme count” and “phoneme” are pieces of information necessary to perform likelihood calculation for each node. The pieces of information of “child node count” and “child node number” are pieces of information necessary for node transition. “Data count” indicates the number of data linked to the corresponding node, i.e., the number of words with identical phoneme sequences. Note that a data count of 0 indicates that the corresponding node is not a leaf node. In addition, “data number” is information necessary to acquire the information of the corresponding word from the leaf node representing the word end.
When a tree structure dictionary is to be implemented in hardware, a problem arises in terms of the storage capacity required for node information. That is, as the number of words registered in a tree structure dictionary increases, the number of nodes increases, resulting in an increase in data amount necessary for the storage of node information. There is a requirement for a reduction in storage capacity necessary for node information.