1. Field of the Invention
The present invention relates to a speech recognition system, a speech recognition method and a speech recognition program, suitable for large vocabulary continuous speech recognition (LVCSR) with high accuracy and at high speed.
2. Related Art
As described in Non-Patent Document 1, attempts to realize highly-accurate and high-speed LVCSR are carried out actively in recent years. In the LVCSR, the search space becomes very large, so search algorithm design is important. In the search algorithm, a method called “acoustic lookahead” (hereinafter, referred to as “lookahead”) has been widely known, in which not only accumulated score up to a node on the trellis but also accumulated score (estimated value thereof) after the node is taken into consideration. FIG. 5 shows a speech recognition system relating to this art as a first conventional example. Hereinafter, description will be given based on FIG. 5.
A speech recognition system 700 of the first conventional example includes a data processor 710 and a data storage device 720. The data storage device 720 includes a speech buffer 721 in which speeches having a plurality of frames are accumulated, and a lookahead value buffer 722 in which lookahead values of speeches which are created by processing in a reverse direction with respect to the speeches accumulated in the speech buffer 721 are stored. The data processor 710 includes a distance calculation/lookahead unit 711 which creates lookahead values and stores them in the lookahead buffer 722, and a distance calculation/word string matching unit 712 which performs general word matching processing by using values in the speech buffer 721 and the lookahead value buffer 722.
The speech recognition system 700 operates as follows. First, the distance calculation/lookahead unit 711 operates as follows. That is, the distance calculation/lookahead unit 711 waits until temporal data sequence of inputted speech features is accumulated in the speech buffer 721, and then processes the speech data in a temporally-reverse order to thereby create lookahead values for the respective frames, and accumulates them in the lookahead value buffer 722. When processing of the speech buffer 721 is completed, it notifies the distance calculation/word string matching unit 712 of that fact. Then, the distance calculation/word string matching unit 712 operates as follows. That is, the distance calculation/word string matching unit 712 performs continuous word matching with reference to the speech buffer 721 and the lookahead value buffer 722, and when the processing of the speech buffer 721 and the lookahead value buffer 722 is completed, it notifies the distance calculation/lookahead unit 711 of that fact. Then, the distance calculation/lookahead unit 711 waits again until the data is accumulated in the speech buffer 721, and repeats the same processing. The continuous word matching result is held in the distance calculation/word string matching unit 712, and the result is outputted when all speech data has been processed.
Next, FIG. 6 shows a speech recognition system described in Patent Document 1, as a second conventional example. Hereinafter, description will be given based on FIG. 6.
A speech recognition system 800 of the second conventional example has three-stage processing units including an analyzer 801, a plurality of word level processors 821 to 823, a plurality of sentence level processors 861 and 862. According to the speech recognition system 800, each of the word level processors 821 to 823 and each of the sentence level processors 861 and 862 perform input and output synchronously with the speech signal inputted into the analyzer 801, that is, perform parallel processing. Thereby, processing can be performed at a higher speed than the case where the whole processing is performed by a single processor. The reference numeral 804 denotes a data transfer unit, 807 denotes a transfer instruction unit, 808 denotes a priority change unit, and 831, 832, 833, 851 and 852 denote FIFO.
Non-Patent Document 1: “A Study on a Phoneme-graph-based Hypothesis Restriction for Large Vocabulary Continuous Speech Recognition”, by Takaaki Hori, Naoki Oka, Masaharu Katoh, Akinori Ito and Masaki Kohda, Information Processing Society of Japan Journal, vol. 40, No. 4, April 1999
Patent Document 1: Japanese Patent Application Laid-Open No. 4-232998, “SPEECH RECOGNITION DEVICE”