There is known a technique of performing automatic speech interpretation (speech translation) by performing speech recognition of an utterance input by speech and translating the recognition result. In this speech translation, importance is placed on a technique of outputting a translation result more instantly. If, for example, it is possible to designate (set) the start and end points of input speech, i.e., an utterance, by designation from the system side or an instruction from the user, it is allowed to perform translation processing in a designated unit. Shortening this unit allows to obtain a translation result more instantly. In contrast to this, when performing speech translation for sequentially and continuously input speech, for example, performing speech communication over telephones, it is impossible to designate the start and end points of an utterance in accordance with an instruction from the user or the like. In such a case, speech translation is simply performed with a wait for a temporary interruption of speech communication. This, however, leads to too long waiting times. At present, there have not been much developments and proposals for techniques and methods of sequentially performing speech translation in this case.
In order to solve this problem, there has been proposed a method of performing speech recognition using a multipass search system by performing the first recognition pass at predetermined time intervals, confirming and outputting stable sections in predetermined time intervals in the second recognition pass, and sequentially outputting speech recognition results (see patent literature 1). There has also been developed a method of estimating the timing of driving the second recognition pass in accordance with frame reliability to cut waste in speech recognition due to always performing the second recognition pass at predetermined time intervals (see patent literature 2).
The above technique is, however, a speech recognition technique, and there has been no mention about how to combine translation processes as discrete processes for the contents of utterances after speech recognition. Furthermore, the recognition results obtained by the above technique do not always correspond to units suitable for translation.
There is also available a method of coping with continuous inputs by giving start and end points to sentences based on syntax restrictions by performing syntax analysis after speech recognition (see patent literature 3). This method, however, increases the processing amount by additionally performing syntax analysis after speech recognition, and leads to a deterioration in the real time performance of recognition result outputs.
There is also available a method of learning periods, in speech recognition, by using a language model, empirical rules, and pause lengths, estimating sentence boundaries of recognition results by inserting the learnt periods in the recognition results, and outputting the recognition results as units suitable for translation processing (see patent literature 4). This method, however, gives no consideration to real time performance for sequentially outputting recognition results or performing translation processing of continuous inputs.