A speech recognition system provides a result of recognition based on the higher degree of matching by comparing the input voice with a plurality of pattern candidates as the comparison objects stored previously. It is used for a user to input, by voice, a name of place to be set as a destination, for example, in a navigation system. Particularly, when a driver utilizes a mobile navigation system while driving, the voice input method is very safe and effective since this method does not require the driver's manipulation such as controlling buttons or watching a display.
In order to satisfy such functions, it is essential to easily designate a place in a sufficient detail level. Practically, it is required to be able to input the place up to the level of Town or Street (smallest unit of area) under the City name, exceeding the level of Prefecture and City. Moreover, when users desire, for example, to set the destination as “Showa-Town, Kariya-City, Aichi-Prefecture”, it is very troublesome for users when they are requested to pronounce the destination in separation for every levels of Town, City and Prefecture such as “Showa-Town”, “Kariya-City” and “Aichi-Prefecture”. Therefore, it is preferable for users that they can input the series of words of the address continuously (continuous input).
In the case of an address of a place in Japan, the address is first branched, in the expression of the Japanese style, depending on the Metropolis of Tokyo, Hokkaido, Osaka-Fu, Kyoto-Fu, and 43 Prefectures which are the highest hierarchical level and the branching factors are increasing in the sequence of the voice input such as City, Town, and a house number. Therefore, it is effective to execute the speech recognition by using a recognition dictionary of a tree-structure for such recognition words. FIG. 6 shows an example of the tree-structure dictionary for the recognition of addresses in Japan. In this case, an address is first branched, as explained above, depending on the highest hierarchical level (for example, Aichi-Prefecture, Gifu-Prefecture, . . . ), then branched depending on City (Town, Village) for each of the highest hierarchical level, and then branched depending on name of Town for each City, . . . Namely, when expression of an address in the Japanese style is considered in the sequence of the voice input, as the address level becomes lower, the branching points increase more.
However, in various countries other than Japan in the world, for example, in the USA and European countries, an address often starts from a house number as the lowest hierarchical level and is then expressed in the reverse sequence of the expression in the Japanese style such as Name of Street→Name of City→Name of State. Therefore, if a recognition dictionary of tree-structure is generated for such recognition of the addresses, the recognition dictionary is formed in the so-called “Backward Tree-structure”, in which the number of branching points decreases and are to be combined as the address level becomes higher. As a result, an address is branched to the next level at many branching points from the first hierarchical level (lowest hierarchical level) (for example, in the USA, such branching points increases up to about several millions from several hundreds of thousand). Therefore, it is probable that a load of the matching process increases and thereby the recognition time becomes longer.
Moreover, as a method of reducing a load of the process, it is known to introduce a “cut of branch” process to narrow down candidates in the recognition in the tree-structure dictionary. However, when such “cut of branch” process is adopted to the backward tree-structure dictionary, the possibility for cutting the branch including the word of the correct answer actually becomes high. Therefore, cut of branch cannot be effective, resulting in poor recognition performance. Accordingly, since there is no particular merit for generation of the recognition dictionary of the tree-structure, it has been difficult to apply the continuous speech recognition technique to such a voice input having a backward tree-structure.
This problem arises, in addition to the case of address, in any recognition dictionary of backward tree-structure.