In recent years, speech language processing technology has improved, and a speech interpretation apparatus which converts a speech audio in a first language to make an output in a second language is receiving attention. Such a speech interpretation apparatus can be applied to displaying translation subtitles and adding interpretation audio at conferences or lectures. For example, a conference system which displays bilingual subtitles including both a recognition result of a speech audio in a first language and a translation result in a second language corresponding to the recognition result has been proposed.
However, a delay from a start of speaking to the start of output of a translation result corresponding to the words of the speech sometimes causes a problem. The translation result needs to be continuously output for a certain time so that a viewer can understand its meaning. Thus, when translation results are long, the amount of delay may accumulatively increase as speaking continues. For example, in a lecture or the like, when a speaker continues to speak, the display of translation subtitles corresponding to the words of the speaker may gradually lag, thereby making it difficult for audience to understand the meaning. However, a simple reduction in output duration of the translation result may make it difficult to understand the meaning. The number of letters or words which a viewer can understand in a certain time is limited. Therefore, when the output duration of a translation time is short, there is concern that the output may be terminated before a viewer understands the meaning (or finishes reading).