Use of conventional voice recognition is mostly limited to an expressible speech composed using a predetermined input information such as a voice command and a phrase. However, expansion of computational resources allows use of a large-scale corpus, and algorithms capable of handling large-scale data are being widely developed. In addition, frameworks such as Software as a Service (SaaS) and Platform as a Service (PaaS) are now available. With these changes, interfaces and applications for a dialogue using voice recognition are today in widespread use. The voice recognition is used for, for example, real-time captioning directly displaying voice as text.
Conventional technologies, however, have difficulty in properly correcting a voice recognition result on a real-time basis. For example, in correcting caption information displayed in a time series, an editor needs to have steps of following a target range to be corrected, designating a range to be corrected, inputting a corrected text, and finalizing a corrected content. Furthermore, after a corrected result is displayed, due to discrepancy resulting from a semantic and time-series factors between the text before corrected and the latest presented text, the corrected result may prevent viewers from understanding the text content.