Many electronic devices need to determine a "most likely" path of a received signal. For example, in speech, text, or handwriting recognition devices, a recognized unit (i.e., sound, syllable, letter, or word) of a received signal is determined by identifying the greatest probability that a particular sequence of states was received. This determination may be made by viewing the received signal as generated by a hide ten Markov model. A discussion of Markov models and hidden Markov models is found in Rabiner, "A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition", Proceedings of the IEEE, Vol. 77, No. 2, February 1989. Also, this signal may be viewed as generated by a Markov model observed through a "noisy" process. This is discussed in Forney, "The Viterbi Algorithm", Proceedings of the IEEE, Vol. 61, No. 3, March 1973. The contents of these articles are incorporated herein by reference.
Briefly, a Markov model is a system which may be described as being in any one of a set of N distinct states. At regularly surfaced time intervals, the system makes a transition between states (or remains in the same state) according to a set of transition probabilities. A simple three state Markov model is illustrated in FIG. 1.
FIG. 1 shows a three state transit ion model 15. In this model, it is assumed that any state may follow any other state, including the same state repeated. For each state, there is a known probability that it will follow any other state. For example, in the English language, this probability may be statistically determined by determining how often each letter is followed by another letter (or itself). In this illustration, assume that state 1 is the letter A, state 2 is the letter B, and state 3 is the letter C. Probabilities are assigned to the likelihood that any one of these letters will follow the same or another letter. In this example, an illustrative probability of 0.1 has been assigned to the likelihood that A will be followed by another A, 0.4 that A will be followed by a B, and 0.5 that A will be followed by a C. The same is done for the letters B and C, resulting in a total of nine probabilities. In this model, the state is apparent from the observation, that is, the state is either A, B, or C in the English language.
Often the states of the model generating the observations cannot be observed, but may only be ascertained by determining the probabilities that the observed states were generated by a particular model. For example, in the example of FIG. 1, assume that due to "noise", there is a known probability that in state A the symbol may be corrupted to appear to be a B, and a known probability that in state A the symbol will be corrupted to appear as a C. The same is true for B and C. To determine the best state sequence associated with the observations of this "noisy" state sequence, the text recognition device must determine, through probabilities, which letters are most likely to be in the sequence.
FIG. 2 is a block diagram of a text recognition device 20, comprising a document scanner 22 and a text recognition processor 24. The text recognition processor comprises a first input/output (I/O) device 26 connected to a bus 28. A central processing unit (CPU) 30, a memory 32 and a second input/output device 34 are also connected to the bus 28. The second input/output device may also be connected to a display device 36 such as a computer monitor or LCD display. The device 20 may operate as follows. A document is scanned in the scanner 22, which sends electronic information about the scanned document to the first I/O 26. The first 1/0 26 sends the electronic information to the bus 28, which sends the information to CPU 30 for processing. The CPU 30 may retrieve instructions or other data from software residing in the memory 32, such as a random access memory (RAM). This information is delivered to the CPU via the bus 28. The CPU may also store some of the electronic information in memory 32. Once the text has been processed (i.e., recognized) the recognized text may be sent to the second I/O 34 for delivery to the display 36.
FIG. 3 is a block diagram of a continuous speech recognition device 40, comprising a microphone 42, an analog-to-digital (A/D) converter 44, and a speech recognition processor 24'. A speech signal may be detected by the microphone 42 and convened into a digital signal by A/D converter 44 for use by the speech recognition processor 24'. This processor 24' may have similar components as the text recognition processor 24, except different software resides in the memory 32. The detected speech signal is processed and a recognized utterance may be displayed by the display 36.
In this continuous speech recognition device 40, for example, the probability that observed sounds and/or words at a particular time are a particular state is reached by considering three probabilities described in relation to FIG. 4. FIG. 4 illustrates a three state lattice 50 at a third time period t.sub.3. The probability that an observation is in state s at time t is made as follows. There are N possible paths which may reach state s at time t (for example, s.sub.1 at time t.sub.3 in FIG. 4). That is, them is a possible path ending at each state at time t-1 (i.e., states s.sub.1, s.sub.2, s.sub.3 at time h in FIG. 4). Each of these paths has a probability assigned to it that it is the most likely path so far. This is called the probability score. The probability score for each state at time t-1 is multiplied by the known transition probabilities for state s to determine the total probability that that path is likely to have transitioned to state s at time t. The maximum of these products is taken and multiplied by an observation probability to determine the probability that state s was observed at time t.
This probability for determining the best path from the initial time period to the current time period may be expressed as: EQU p[s,t]=max(p[ps,t-1]*b[ps,s])*P[O.sub.t,s ] for ps=1,2, . . . N
where:
s is the current state;
ps is the previous state;
p[s,t] is the probability score of the best path ending at state s at time t;
p[ps,t-1] is the probability score of the best path ending at state ps at time t-1;
b[ps,s] is the known probability of the state ps preceding state s (this is state transition probability;
p[O.sub.t,s ] is the observation probability for current state s at times t; and
N is the total number of states.
It may be observed from FIG. 4 that to determine the most likely state path for each possible state at time t (i.e., states s.sub.1, s.sub.2, s.sub.3 of FIG. 4) max(p[ps,t-1]* b[ps,s]) must be calculated N.sup.2 times per time interval. For this illustrative example where N=3, nine expressions need to be evaluated. In the speech recognition system of FIG. 3, for example, this calculation may be performed several times to determine a single spoken phrase. This calculation may be performed at each time interval using a hidden Markov model, which models phonemes or words, to determine the most likely state sequence at time t. Furthermore, it may be performed again to determine if, in context with surrounding words, the detected word was probably the uttered word. In an actual signal processor, such as the continuous speech recognition system of FIG. 3, the number of possible words may be on the order of 1000. Thus, a single best path calculation may require as many as 1,000,000 calculations (i.e., 1000.sup.2) per time interval. Thus, it may take several million calculations to recognize a single sentence.
Previous attempts at reducing the number of calculations to determine the best state sequence have been suboptimal. For example "pruning" has been suggested as a method for reducing the number of calculations. Pruning is a process where only the most probable "branches" of the lattice are thoroughly investigated, and less probable "branches" are "pruned". This process is suboptimal because it abandons "branches" that begin as less probable, but which several states later may begin to appear as highly probable. For example, in an illustrative model of an English language text recognition device consisting of 26 letters it is unlikely, but not impossible, for an A to be preceded by an A. Thus, a suboptimal solution may prune this state sequence. However, if the word "aardvark", for example, is considered, an incorrect result would probably occur.
Therefore, it is an object of the present invention to provide a method and device for reducing the average number of calculations needed for a best path calculation without sacrificing optimality.