1. Field of the Invention
The present invention relates to a speech recognition apparatus and method using a tree-structure word dictionary.
2. Related Background Art
Processing is carried out according to the following procedures in speech recognition using sound models in the Hidden Markov Model (HMM) or the like. First, features of speech input are extracted. Then output probabilities of sound models constituting each word are acquired according to the relationship between words and sound models described in the word dictionary. Then likelihoods of respective states of each word or each sound model (for example, phoneme) forming the word are retrieved using a search technique such as the Viterbi search or the like, and the speech recognition is carried out based thereon.
FIG. 1 is an explanatory diagram to show an example of the word dictionary. FIG. 1 shows  less than TOKYO greater than ,  less than TOKAI greater than , and  less than TOHOKU greater than  as an example of words (also called recognition object words) described in the word dictionary. In this example, phonemic models are used as sound models. Each word is expressed by connection of phonemic models. For example,  less than TOKYO greater than  is constructed of phonemic models (Japanese language phonemic models) of xe2x80x9ct,xe2x80x9d xe2x80x9co,xe2x80x9d, xe2x80x9co,xe2x80x9d xe2x80x9ck,xe2x80x9d xe2x80x9cy,xe2x80x9d xe2x80x9co,xe2x80x9d and xe2x80x9co.xe2x80x9d
In the speech recognition with reference to large vocabulary, there appear a lot of words having the same phonemic models as the head part of a certain word, as in the case of the example illustrated in FIG. 1. The word dictionary in which head portions of some words are not shared, as in FIG. 1, is called a linear lexicon. In contrast with it, a word dictionary in which head portions of some words are shared is called a tree-structure word dictionary (also called a tree lexicon). Since sound likelihoods of the shared portions are equal, the tree lexicon permits computation of sound likelihood to be omitted at the shared portions.
FIG. 2 is a diagram to explain the tree lexicon using the words listed in FIG. 1. As illustrated in FIG. 2, the tree lexicon is comprised of nodes representing the phonemic models, and arcs connecting the nodes to each other.
Suggested as a technique for searching such a tree-structure word dictionary for a word matching best with speech input is a technique using both language likelihoods acquired from language models represented by word chain probabilities (N-gram) or the like and sound likelihoods computed from sound models. This technique is known as one achieving the effects including improvement in recognition performance, reduction of search space, and so on.
Use of the tree-structure word dictionary, however, poses the following problem.
In the case of the linear lexicon being used as a word dictionary, because a word can be specified at the head part of the word, reference can be made to a language likelihood of that word in computing a likelihood of the first state owned by a sound model (i.e., a phoneme) at the head of that word.
In the case of the tree lexicon being used as a word dictionary, however, a word can be specified first upon arrival at a node immediately after a last-branched node. In the example of FIG. 2, the words can be specified first upon arrival at nodes 201 to 203 indicated by bold circles. Therefore, times of use of language likelihoods are late, thus posing the problem that the reduction is not enough in the search space.
An object of the present invention is to solve the above-described problem.
Another object of the invention is to provide a speech recognition apparatus and method for efficient recognition of words matching with speech input by use of the tree-structure word dictionary.
As a preferred embodiment for such objects, the present invention discloses a speech recognition apparatus comprising:
(a) holding means for holding a tree-structure word dictionary in which sound models at head part of words are shared among the words, wherein said tree-structure word dictionary is comprised of nodes representing said sound models; and
(b) searching means for searching for a word corresponding to speech input, using word information given to predetermined nodes forming said tree-structure word dictionary, wherein said word information is information to specify a word group reachable from each of said predetermined nodes.
As another embodiment, the present invention discloses a speech recognition method comprising the steps of:
(a) holding a tree-structure word dictionary in which sound models at head part of words are shared among the words, wherein said tree-structure word dictionary is comprised of nodes representing said sound models; and
(b) searching for a word corresponding to speech input, using word information given to predetermined nodes forming said tree-structure word dictionary, wherein said word information is information to specify a word group reachable from each of said predetermined nodes.
As another embodiment, the present invention discloses a computer-readable medium storing a program, said program comprising the steps of:
(a) holding a tree-structure word dictionary in which sound models at head part of words are shared among the words, wherein said tree-structure word dictionary is comprised of nodes representing the sound models; and
(b) searching for a word corresponding to speech input, using word information given to predetermined nodes forming said tree-structure word dictionary, wherein said word information is information to specify a word group reachable from each of said predetermined nodes.
As another embodiment, the present invention discloses an apparatus for producing a word dictionary used in speech recognition, said apparatus comprising:
(a) sorting means for sorting a plurality of words, based on sound models representing the words;
(b) generating means for generating a tree-structure word dictionary in which sound models at head part of the words are shared among the words, wherein said tree-structure word dictionary is comprised of nodes representing said sound models; and
(c) providing means for providing predetermined nodes forming said tree-structure word dictionary with word information to specify a word group reachable from each of said predetermined nodes.
As another embodiment, the present invention discloses a method for producing a word dictionary used in speech recognition, comprising the steps of:
(a) sorting a plurality of words, based on sound models representing the words;
(b) generating a tree-structure word dictionary in which sound models at head part of the words are shared among the words, wherein said tree-structure word dictionary is comprised of nodes representing said sound models; and
(c) providing predetermined nodes forming said tree-structure word dictionary with word information to specify a word group reachable from each of said predetermined nodes.
As still another embodiment, the present invention discloses a computer-readable medium, said medium storing a program for producing a word dictionary used in speech recognition, said program comprising the steps of:
(a) sorting a plurality of words, based on sound models representing the words;
(b) generating a tree-structure word dictionary in which sound models at head part of the words shared among the words, wherein said tree-structure word dictionary is comprised of nodes representing said sound models; and
(c) providing predetermined nodes forming said tree-structure word dictionary with word information to specify a word group reachable from each of said predetermined nodes.
Still other objects of the present invention, and the advantages thereof, will become fully apparent from the following detailed description of the embodiments.