1. Field of the Invention
The present invention relates to a speech recognition apparatus which uses grammar segments in which a phrase expressed on the basis of grammar is divided into one or more phrase segments when carrying out a speech recognition process for a phrase to be recognized expressed on the basis of the grammar.
2. Description of the Related Art
In the field of speech recognition, a speech recognition apparatus which carries out speech recognition by decoding speech features of a supplied speech and speech feature models corresponding to a phrase to be recognized which is described on the basis of grammar is widely used.
As the grammar with which vocabularies to be recognized are described, an expression on the basis of phrase network, Context-Free Grammar (CFG) and Finite State Grammar (FSG) are used.
These grammars are stored in a predetermined storage device in the interior of the speech recognition apparatus, and are referred from a decoder on the basis of the progress of a speech recognition process. The reference may be carried out a plurality of times during the speech recognition process. Therefore, in order to carry out the speech recognition process rapidly, the grammar should be stored in the storage device which is rapidly accessible. When the number of vocabularies to be recognized is increased, the size of the grammar in which the vocabularies to be recognized are described is increased correspondingly. Therefore, in order to carry out the speech recognition process with a large number of vocabularies, a large-capacity storage device for storing the grammar is required. In other words, the speech recognition apparatus is required to have a rapidly accessible and large capacity storage device in order to carry out the speech recognition process with a large number of vocabularies rapidly.
However, the rapidly accessible storage device requires higher cost than the storage device which enables only slow access. Since the cost of the storage device is increased with the capacity, the cost of the rapidly accessible and large capacity storage device is very high. Therefore, the speech recognition apparatus may not be provided with the rapidly accessible and high capacity storage device due to the above-described cost problem. In such a case, a storage device which enables only slow access but has a large capacity may be used instead. The storage device as such has a problem that the grammar referencing speed from the decoder is lowered, and hence the speed of the speech recognition process is lowered.
As a method for solving the problem, a technology disclosed in Japanese Patent No. 3546633 (see P. 14, FIG. 1) (hereinafter, referred to as “related art”) is proposed. In the related art, grammar is divided into grammar segments each including a group of grammar regulations encapsulated therein, and all these grammar segments are stored in a storage device which enables only slow access but has a large capacity (storage device 1: HDD for example). Then, the grammar segment that the decoder refers to during the recognition process is stored in a storage device which is rapidly accessible but has only a small capacity (storage device 2: RAM for example). Then, according to the progress of the process carried out by the decoder, only the grammar segment that the decoder should refer to is transferred from the storage device 1 to the storage device 2. In other words, when the grammar segment that the decoder should refer to newly is not stored in the storage device 2, the process carried out by the decoder is stopped once, and the corresponding grammar segment is transferred from the storage device 1 to the storage device 2. When the transfer is completed, the process carried out by the decoder is started again.
According to the related art, only the grammar segment that the decoder needs to refer to must be stored, and hence the capacity of the storage device 1 is reduced.
For example, a case in which grammar segments are prepared for the respective groups (Prefecture, City, Town) of the respective hierarchies for the vocabularies to be recognized having a hierarchical structure such as addresses in Japan is considered. In this case, the grammar segments that the decoder should refer to are limited to the grammar segments relating to candidates to be recognized in the respective hierarchies. In other words, in the process carried out by the decoder, when “Kanagawa-ken” is emerged as a candidate of the name of the prefecture to be recognized, only the grammar segments describing the names of cities and towns relating to “Kanagawa-ken” may be referred to in the subsequent process. As a result, the number of grammar segments to be stored in the storage device 1 described above is limited, and hence the capacity of the storage device 1 may be further reduced. Since the grammar segment to be referred to by the decoder is stored in the rapidly accessible storage device 1, reference of the grammar segment from the decoder is maintained to be rapid. In other words, according to the related art, the speech recognition process is carried out rapidly while restraining the cost increase in association with increase in capacity of the storage device 1.
However, in the related art, when the grammar segment to be referred to newly by the decoder is not stored in the storage device 2, the decoder stops the process once until the corresponding grammar segment is transferred from the storage device 1 to the storage device 2 and the decoder starts the process again when the transfer is completed. In this case, when it takes time to transfer the grammar segment, the process carried out by the decoder, that is, execution of the speech recognition process is delayed by the time corresponding to the waiting time required for transferring the grammar segment.
In general, in order to transfer data from a certain storage device to another storage device, a lag time depending on the speed of the transfer path between the storage devices is generated. In the related art, since the transfer of the grammar segment from the storage device 1 which is accessible only slowly is assumed, a lag time which corresponds to the time that the grammar segment are read out from the storage device 1 is also generated. In the related art, the total length of the lag time is a lag time of the speech recognition process, and hence the speed of the speech recognition process is lowered correspondingly.
In other words, in the related art, there is a problem that the speed of the speech recognition process is lowered due to the waiting time required for transferring the grammar segment, so that the speech recognition process with a large number of vocabularies cannot be carried out rapidly.
In view of the above, its is aimed to provide a speech recognition apparatus in which lowering of the speed of a speech recognition process due to the waiting time required for transferring grammar segment is prevented so that rapid speech recognition process is achieved and a method of the same.