The present invention relates to a pattern recognition apparatus which performs pattern matching between an input pattern as a recognition target and standard patterns described in a plurality of net structure dictionaries using beam search, thereby obtaining a recognition candidate.
Conventionally, when the number of objects to be recognized is very large in voice recognition or the like, pattern matching using beam search is used with which pattern recognition using a small-capacity RAM (Random Access Memory) and a small quantity of calculation is enabled, as disclosed in H. Sakoe et al., xe2x80x9cA High Speed DP-Matching Algorithm Based on Synchronization, Beam Search and Vector Quantizationxe2x80x9d, THE TRANSACTIONS OF THE INSTITUTE OF COMMUNICATION ENGINEERS, Vol. J71-D, No. 9, pp. 1650-1659, September 1988 (reference 1).
Beam search is a technique of removing matching paths that do not affect the recognition result using a net structure dictionary. To avoid influence on the recognition result, many paths must be left at the initial stage, as pointed out in Japanese Patent Laid-Open No. 10-153999 (reference 2). However, when the standard patterns of the recognition target are described in the net structure dictionary, the number of matching paths to be searched is small by itself, and the number of paths to be left can also be relatively small. Hence, when a recognition apparatus is formed using a net structure dictionary, the capacity of a storage means for storing paths can be small.
For pattern recognition, the recognition rate becomes high when the number of recognition targets is small. For this reason, preferably, the use conditions are finely sorted, and a small number of targets are recognized in units of finely sorted conditions. On the other hand, in some cases, recognition targets under different use conditions, e.g., place names in each administrative district and nationwide place names may be simultaneously recognized. If it is known that only a place name in a specific administrative district is to be recognized, a net structure dictionary related to this specific administrative district is used as a recognition target, thereby improving the recognition rate. If a nationwide place name is to be recognized, the place name dictionary for each administrative district is simultaneously used together as a recognition target.
In this case, the dictionary for the nationwide place names is unnecessary. Unlike a case wherein nationwide and administrative district dictionaries are independently prepared, the capacity of a storage means (memory) for storing the dictionaries is halved. Thus, when a plurality of dictionaries can be simultaneously used as recognition targets, the dictionary storage capacity can be reduced.
A conventional pattern-recognition apparatus which performs pattern matching using beam search which uses a plurality of net structure dictionaries as recognition targets will be described next. As shown in FIG. 4, this pattern recognition apparatus comprises two, net structure dictionary (A) 401 and net structure dictionary (B) 402, a beam search pattern matching section 403, input section 404, and display section 405.
Using the beam search algorithm described in reference 1, the beam search pattern matching section 403 obtains the pattern distance between an input pattern input from the input section 404 and a standard pattern described in each of the net structure dictionaries 401 and 402 and outputs to the display section 405 a recognition target in a dictionary, which gives the minimum pattern distance, as a recognition result.
The operation of the conventional pattern recognition apparatus will be described below in more detail.
Prior to the description of the operation, a general net structure dictionary will be described. A net structure dictionary is a set of recognition target words and is designed to connect an arc 501 corresponding to a tone of a recognition target word to a numbered node 502 so as to form each recognition target word, as shown in FIG. 5.
Letting v be the dictionary number, and xcexv be the net structure dictionary, the net structure dictionary having the structure shown in FIG. 5 is described as follows. Note that Av is the set of arcs representing a standard pattern, Nv is the set of nodes connecting arcs, Wv is the recognition target set, ENv is the end node set, PAv is the partial arc set, and WNv is the recognition target node set.
xcexv=(Nv, Av, Wv, ENv, PAv, WNv) {V=1, 2, . . . , V}
Nv={nv_i: i=0, 1, . . . , Iv}
Av={av_j: j=1, 2, . . . , Jv}
Wv={wv_k: k=1, 2, . . . , Kv}
ENv(av_j): the end node of an arc av_;
PAv(nv_i): the set of arcs having a node nv_i as a start node
WNv(wv_k): a node representing the end of a word wv_k
With this description, a net structure dictionary shown in FIG. 6 is constructed. As shown in FIG. 6, in a net structure dictionary A, an arc xe2x80x9cA(2)xe2x80x9d common to recognition target words xe2x80x9cAO (xe2x80x9cbluexe2x80x9d in Japanese)xe2x80x9d and xe2x80x9cAKA (xe2x80x9credxe2x80x9d in Japanese)xe2x80x9d is connected to arcs xe2x80x9cO(3)xe2x80x9d and xe2x80x9cKA(4)xe2x80x9d by node #2, thereby forming a net structure. Additionally, an arc xe2x80x9cI(5)xe2x80x9d common to recognition target words xe2x80x9cIKE (xe2x80x9cpondxe2x80x9d in Japanese)xe2x80x9d and xe2x80x9cISHI (xe2x80x9cstonexe2x80x9d in Japanese)xe2x80x9d is connected to arcs xe2x80x9cKE(6)xe2x80x9d and xe2x80x9cSHI(7)xe2x80x9d by node #5, thereby forming a net structure.
In a net structure dictionary B, an arc xe2x80x9cA(2)xe2x80x9d common to recognition target words xe2x80x9cAKI (xe2x80x9cautumnxe2x80x9d in Japanese)xe2x80x9d and xe2x80x9cASA (xe2x80x9cmorningxe2x80x9d in Japanese)xe2x80x9d is connected to arcs xe2x80x9cKI(3)xe2x80x9d and xe2x80x9cSA(4)xe2x80x9d by node #2, thereby forming a net structure. Similarly, an arc xe2x80x9cU(5)xe2x80x9d common to recognition target words xe2x80x9cUE (xe2x80x9cupsidexe2x80x9d iin Japanese)xe2x80x9d and xe2x80x9cUSU (xe2x80x9cmortarxe2x80x9d in Japanese)xe2x80x9d is connected to arcs xe2x80x9cE(6)xe2x80x9d and xe2x80x9cSU(7)xe2x80x9d by node #5, thereby forming a net structure.
In the above net structure, the independent beam search pattern matching section 403 (FIG. 4) obtains a recognition result by the following procedure. In the following description, let X=(x0x1x2 . . . xt . . . xT) (t is time) be the input pattern, d(t, v_j) be the local pattern distance between the input pattern xt at time t and the arc av_j, g(t,xc2x7) be the accumulated distance of local pattern distances until time t, and J(t) be the set of standard patterns of arcs on the search matching path at time t. A standard pattern of an arc is represented by a standard pattern of DP-matching described in reference 1.
Letting S be the maximum number of search matching paths, a recognition result w is obtained. Additionally, let min[ ] be a calculation that gives the minimum value, and argmin[g(xc2x7, k)|S] be a calculation for acquiring the value k that gives the Sth value g in the ascending order.
 less than Initial Settings greater than 
Step S20
J(t=0)={0xe2x80x941}
g(t=0, v_k)=∞{k=0, 1, . . . , Jv, v=1, . . . V}
g(t=0, 0xe2x80x941)=d(0, 0xe2x80x941) t=1
 less than Processing Main Body greater than 
Step S21
g(t, v_k)=min[g(txe2x88x921, v_k), g(txe2x88x921, v_j)]+d(t, v_k){av_k xcex5 PAv(EN(av_j)), v_j xcex5 J(txe2x88x921)}J(t)={argmin[g(t, v_k)|S]}
Step S22
If t less than T, the flow advances to step S21.
Step S23
v_m=argmin[g(T, v_k)|1]
v_k xcex5 J(t), EN(av_k) xcex5 WNv, v=1, 2, . . . , V
Recognition result: w that satisfies WNv(w)=EN(av_m)
END
However, when a plurality of dictionaries are used, as described above, a plurality of search matching paths with the same connection of standard patterns are present although the dictionaries are net structure dictionaries. For this reason, unless the number of paths to be left without being removed is increased, the recognition result is affected, as will be described below.
For example, the two net structure dictionaries A and B having the structures shown in FIG. 6 are used, the search matching path passes through the same arc of the two dictionaries. In this case, matching processing shown in FIG. 7 is executed.
More specifically, at time t=1, the independent beam search pattern matching section 403 (FIG. 4) loads, to the internal RAM, the arc (1) connected to node #1 of the net structure dictionary (A) 401 and the arc (1) connected to node #1 of the net structure dictionary (B) 402. Next, the independent beam search pattern matching section 403 selects the arcs (5), (2), and (1) connected to nodes #5, #2, and #1 of the net structure dictionary (A) 401 and the arcs (5), (2), and (1) connected to nodes #5, #2, and #1 of the net structure dictionary (B) 402 as next recognition candidates.
At time t=2, the independent beam search pattern matching section 403 selects and loads, to the internal RAM, the arc (2) connected to node #2 of the net structure dictionary (A) 401 and the arc (2) connected to node #2 of the net structure dictionary (B) 402 as arcs (search matching path) matching the input recognition target. Next, the independent beam search pattern matching section 403 selects the arcs (4), (3), and (2) connected to nodes #4, #3, and #2 of the net structure dictionary (A) 401 and the arcs (4), (3), and (2) connected to nodes #4, #3, and #2 of the net structure dictionary (B) 402 as next recognition candidates.
Finally, at time t=3, the independent beam search pattern matching section 403 selects the arc (3) connected to node #3 of the net structure dictionary (A) 401 and the arc (3) connected to node #3 of the net structure dictionary (B) 402 as arcs matching the input recognition target, connects them to the arcs that have already been loaded to the internal RAM, and displays them on the display section 405 as a result of recognition candidate.
In the above-described recognition, when the voice to be recognized is xe2x80x9cIKE (pond in Japanese)xe2x80x9d, a recognition error occurs in selection at time t=2. If a recognition error occurs at the initial stage, xe2x80x9cIKE (pond in Japanese)xe2x80x9d as a recognition target word is not recognized when the two net structure dictionaries are used, as described above.
On the other hand, as shown in FIG. 8, when a single net structure dictionary is used, the arcs xe2x80x9cOxe2x80x9d, xe2x80x9cKAxe2x80x9d, xe2x80x9cKIxe2x80x9d, and xe2x80x9cSAxe2x80x9d are connected, through node #2, to the arc xe2x80x9cAxe2x80x9d common to the recognition target words xe2x80x9cAO (blue in Japanese)xe2x80x9d, xe2x80x9cAKA (red in Japanese)xe2x80x9d, xe2x80x9cAKI (autumn in Japanese)xe2x80x9d, and xe2x80x9cASA (morning in Japanese)xe2x80x9d, thereby forming a net structure. In addition, the arcs xe2x80x9cKExe2x80x9d and xe2x80x9cSHIxe2x80x9d are connected, through node #7, to the arc xe2x80x9cIxe2x80x9d common to the recognition target words xe2x80x9cIKE (pond in Japanese)xe2x80x9d and xe2x80x9cISHI (stone in Japanese)xe2x80x9d, thereby forming a net structure. Furthermore, the arcs xe2x80x9cExe2x80x9d and xe2x80x9cSUxe2x80x9d are connected, through node #10, to the arc xe2x80x9cUxe2x80x9d common to the recognition target words xe2x80x9cUE (upside in Japanese)xe2x80x9d and xe2x80x9cUSU (mortar in Japanese)xe2x80x9d, thereby forming a net structure.
When the single net structure dictionary shown in FIG. 8 is used, matching processing shown in FIG. 9 is executed. In this case, first at time t=1, the independent beam search pattern matching section receives the arcs connected to node #1 into the internal RAM. Next, the independent beam search pattern matching section selects the arcs (10), (7), (2), and (1) connected to nodes #10, #7, #2 and #1 as next recognition candidates.
At time t=2, the independent beam search pattern matching section selects and loads, to the internal RAM, the arcs (7) and (2) connected to nodes #7 and #2 as arcs (search matching path) matching the input recognition target. The independent beam search pattern matching section 403 selects arcs (9) to (2) connected to nodes #9, #8, #7, #6, #5, #4, #3, and #2 as next recognition candidates.
Finally, at time t=3, the independent beam search pattern matching section selects the arcs (8) and (3) connected to nodes #8 and #3 as arcs matching the input recognition target and connects them to the arcs that have already been loaded to the internal RAM, thereby obtaining a result of recognition candidate. In this case, even when an recognition error occurs at time t=2, xe2x80x9cIKE (pond in Japanese)xe2x80x9d is finally selected as a recognition candidate.
Referring to FIGS. 7 and 9, the phase where one arc branches to a plurality of arcs represents not that the arcs are selected but that they are matching targets according to equation in step S21. Conversely, in the phase where the number of arcs decreases, the arcs are selected.
As described above, when a plurality of net structure dictionaries are used, as in the prior art, a plurality of search matching paths pass through the same standard patterns until the midway. Hence, the recognition performance degrades as compared to use of a single dictionary.
Degradation in recognition performance in use of a plurality of net structure dictionaries can be compensated by increasing the maximum number of search matching paths. In this case, however, the accumulated distance or the number of arcs on the search matching paths increases in proportion to the maximum number of search matching paths, so the capacity of RAM for storing these pieces of information increases as compared to use of a single dictionary.
On the other hand, when a single net structure dictionary is used, as described above, the recognition performance can be held without increasing the maximum number of search matching paths. However, in case of the single net structure dictionary, since the quantity of data of the dictionary is large, generation of the dictionary is time-consuming, resulting in long time required for recognition. In addition, when a single dictionary is used, undesirable arcs are also recognized, and the recognition rate becomes poor.
To solve this problem, when the single dictionary is stored in a ROM (Read Only Memory) in advance, the dictionary generation time can be shortened. However, when the dictionary is formed from a number of arcs, the number of combinations of arcs is very large. For this reason, the capacity of the ROM required to form the net structure dictionary also becomes large.
Summarizing the problems of the prior art, when a plurality of net structure dictionaries are used to suppress degradation in recognition performance, the RAM requires a large capacity. On the other hand, when a single dictionary is used, a long time is required for recognition, or the ROM requires a very large capacity.
It is an object of the present invention to provide a pattern recognition apparatus capable of simultaneously using a plurality of net structure dictionaries to set a recognition target with a variety of choices.
It is another object of the present invention to provide a pattern recognition apparatus which requires no large-capacity RAM for compensating the recognition performance.
In order to achieve the above objects, according to the present invention, there is provided a pattern recognition apparatus comprising a plurality of net structure dictionaries each having a net structure formed by connecting, through a node, a parent arc representing a common portion of a plurality of standard patterns as recognition units of a recognition target to a child arc representing a remaining portion, reconstruction means for extracting a common structure from the plurality of net structure dictionaries and reconstructing the net structure dictionaries to generate a reconstructed dictionary, first storage means for storing original dictionary connecting information representing a relationship between the extracted common structure and the plurality of net structure dictionaries, and pattern matching means for referring to the plurality of net structure dictionaries, the reconstructed dictionary generated by the reconstruction means, and the original dictionary connecting information stored in the first storage means and performing pattern matching processing between an input pattern and recognition targets formed from standard patterns using beam search, thereby selecting a recognition candidate from the recognition targets.