Priority is claimed to Korean Patent Application No. 00-60262 filed on Oct. 13, 2000, herein incorporated by reference.
1. Field of the Invention
The present invention relates to a method and apparatus for recognizing unknown information using the Hidden Markov Model (HMM), and more particularly, to an information search method and apparatus using an Inverse Hidden Markov Model (IHMM) and the Viterbi algorithm, which can be used for finding the most likely path through the vocabulary network for a given utterance.
2. Description of the Related Art
Various information recognition methods are used for recognizing various types of information. In case that the information is in the form of speech, a speech recognition method is used for recognizing the information of speech. This will now be explained.
Speech recognition technology refers to technology for developing and implementing methods for enabling a machine or a computer to recognize human speech. To implement an interface which enables natural and smooth communications between a machine and a human being through speech, speech signal processing technology, particularly speech recognition technology, is necessary. Thanks to innovative developments in the fields of computer and signal processing technology, studies on speech recognition technology are making rapid progress. Also, the introduction of stochastic techniques, use of affluent speech data, combination of speech and language knowledge, and use of high-speed search algorithms have boosted this research.
In addition, due to the development of the semiconductor industry, very small-sized systems, which can be used for a long time with low power, have become commercially available. In communications fields, particularly the field of wireless communications, power consumption and device sizes can be reduced with hardware, i.e., by using a dedicated chip which performs speech recognition.
Because of their robust modeling capability and high recognition accuracy for speech signals, HMM-based speech recognition methods are extensively used in the speech recognition field. Speech recognition methods involve computing acoustic probabilities, and finding a best matching word using the acoustic probabilities. In particular, isolated word recognition using the HMM consists of two phases: a training phase and a search (recognition) phase. In the training phase, for each word in a predetermined dictionary to be used, the HMM parameters are estimated, and a distinct HMM is built up for each word using a training set of observations. In the search phase, the probability that a given utterance is similar to each word model of the dictionary is computed, and the highest likelihood word model is selected as the recognized word. The Viterbi algorithm is an efficient search technique for finding a best matching word by comparing an input utterance with each of the word models, i.e., reference speech models, in the dictionary, and thus is generally used in the search phase.
A typical speech recognition method is divided into a pre-process step, a core recognition step, and a post-process step. Detailed techniques applied to each of the steps will be described using an example. For the pre-process step, feature parameters which form reference utterances are estimated from an input speech signal. One of the methods to accomplish the pre-process step consists of linear predictive coding (LPC) and a filter bank front-end process.
The core recognition step involves matching and training processes. The estimated parameters express utterance features to match an input speech signal on a phonetic notation and word level. The utterance features of the input speech signal are expressed with the matching and training data set. The post-process step, as a search process, finds a best matching utterance sequence through the vocabulary network for a given utterance.
FIG. 1 is an exemplary state lattice of a conventional method illustrating a path along which an input speech utterance is recognized using a HMM. The state lattice includes a plurality of states 1 through 24 which are searched in order from state 1 to state 24.
For example, if a word xe2x80x9cabbatxe2x80x9d is uttered, the state lattice for the word is expressed, as shown in FIG. 1, by the general HMM. The HMM applied for speech recognition has local transition paths. Transition to a next state is determined by a previously accumulated state probability (Stxe2x88x921(i)), a transition probability (aij), and a next state observation probability (bij(W(t)), where t is a time variable, i is a variable denoting a state searched at time t in the lateral direction of the state lattice of FIG. 1, and j is a variable denoting a state searched at time txe2x88x921 in the lateral direction of the state lattice of FIG. 1. The next state observation probability is determined by a probability density function (P(y|s(i),xcfx86)). State transition to a next state is performed by a conventional speech information search method that uses a Viterbi algorithm based on the maximum likelihood.
A conventional speech information search method will be described with reference to FIG. 2. FIG. 2 shows a pseudo code for illustrating the conventional speech information search method consisting of steps 1 through 12. Referring to FIG. 2, in step 1, a variable Max which indicates the maximum likelihood score is initialized to 0. In step 2, a variable all_reference_sequence, which indicates the location of each of the reference speech models stored in a dictionary, is set to 1. The variable t is set to 1 in step 3, and i is set to 1 in step 4. In step 5, the maximum state probability (xcfx86i(t)) for the i-th state at time t is initialized to 0. In step 6, j is set to 1.
After all the variables are initialized, the maximum state probability (xcfx86i(t)) for the i-th state at time t is computed in consideration of the accumulated state probability for the j-th state searched at time txe2x88x921(steps 6 through 8). In step 8, xe2x80x9cendforxe2x80x9d means that the process goes back to step 6 if jxe2x89xa0N, and goes to step 9 if j=N. Steps 5 through 8 are iterated for all the states from 1 to N, so that N maximum state probabilities are obtained (steps 4 through 9). The maximum state probabilities are computed for a period of time T to find an optimal path for a particular reference speech model, so that the similarity between the given reference speech model and an input unknown speech signal is obtained (steps 3 through 10). Steps 3 through 10 are iterated for all the reference speech models (steps 2 through 12) stored in the dictionary. In step 11, xcfx86allxe2x80x94referencexe2x80x94sequence, which means the optimal path value computed for a particular reference speech model, is compared with the maximum likelihood score (Max), and the maximum likelihood score (Max) is updated with the greater of the two. With the updated maximum likelihood score (Max), steps 2 through 12 are iterated for another reference speech model.
The conventional speech information search method illustrated in FIG. 2 needs probability computations for all states to obtain the maximum likelihood score. Due to the need for much computation, the conventional speech information search method is unsuitable for high-speed searching. For the conventional speech information search method, all the states are searched, and all the paths for each state must be stored, so that many hardware resources are consumed and many computations are inefficiently required. For these reasons, the conventional speech information search method has problems of increasing power consumption and computation time.
To solve the above problems, it is a first object of the present invention to provide an information search method using an Inverse Hidden Markov Model (IHMM), which finds an optimal path in a Hidden Markov Model (HMM) state lattice using a minimum unlikelihood score, rather than the maximum likelihood score, and using the Viterbi algorithm, to recognize unknown information, so that unnecessary computations are avoided.
It is a second object of the present invention to provide an information search apparatus using the IHMM used for the information search method described above.
To achieve the first object of the present invention, there is provided an information search method using an IHMM, for stochastically searching for a reference information model among a plurality of predetermined reference information models obtained by training that best matches unknown information which can be expressed by a HMM chain, the information search method comprising the steps of: (a) obtaining a minimum state probability xcfx86xe2x80x2i(t) for a particular state searched at a current time t in a HMM state lattice of a reference information model by using minimum state probabilities of effective states, each of which is a probability accumulated along a search path up to a previous time txe2x88x921, and then updating the obtained minimum state probability xcfx86xe2x80x2i(t) with a predetermined value if the minimum state probability xcfx86xe2x80x2i(t) is greater than a minimum unlikelihood score; (b) obtaining an optimal path value corresponding to the lowest of a plurality of minimum state probabilities obtained after step (a) is performed for a period of time T which is greater than t, and updating the minimum unlikelihood score with the optimal path value if the minimum unlikelihood score is greater than the optimal path value; (c) determining whether or not the optimal path value has been obtained for each of the predetermined reference information models, and if not, iterating steps (a) and (b) for another reference information model for which the optimal path value has not been obtained; and (d) if the optimal path value has been obtained for each of the reference information models, determining that the reference information model yielding the optimal path value with which the minimum unlikelihood score was last updated best matches the unknown information, wherein for the period of time T, step (a) is performed for each of one or more states searched at every current time t=1, 2, . . . T, the effective states are states searched at previous time txe2x88x921 which have minimum state probabilities less than the minimum unlikelihood score, and from which a transition can be made to the particular state searched at time t, and the minimum state probability of the particular state corresponds to the lowest among state probabilities of the particular state for making a transition to the particular state from the effective states.
It is preferable that the minimum state probability of the particular state corresponds to the lowest probability among the state probabilities obtained, equal in number to the effective states, by using the minimum state probabilities of the effective states, transition probabilities from the effective states to the particular state, and observation probabilities of the particular state, and the predetermined value is greater than the minimum unlikelihood score.
To achieve the second object of the present invention, there is provided an information search apparatus using an Inverse Hidden Markov Model (IHMM), for performing the information search method of any one of claims 1 through 4, the information search apparatus comprising: a storage unit for storing the minimum unlikelihood score, a first value related to the transition probability, and a second value related to the state observation probability; first through N-th processing units; first through N-th comparing and selecting units; a control unit for outputting a control signal in response to a comparison flag output from each of the first through the N-th comparing and selecting units, controlling reading of data stored in the storage unit, and passing values output from the first through the N-th comparing and selecting units; and a buffer for receiving and buffering values passed by the control unit and outputting each of the buffered values to a corresponding processing unit among the first through the N-the processing units, in response to the control signal, wherein an i-th processing unit processes a value output from the buffer, and the first value and the second value provided from the storage unit, compares the processed results, selects and outputs the smallest processed result in response to the control signal, and an i-th comparing and selecting unit compares the smallest processed result output from the i-th processing unit with the minimum unlikelihood score provided from the storage unit, selects one of the predetermined value and the smallest processed result in response to the result of the comparison, and outputs the selected value and a comparison flag having a level corresponding to the result of the comparison to the control unit.