Hitherto, there has been known a technology of displaying character information (e.g., subtitles) representing each voice every time the voice is output during reproduction of voice storage data (e.g., moving image data) storing a plurality of voices to be output sequentially.
For example, in Patent Literature 1, there is described a system configured to create character information representing voices of a cast and provide viewers with the character information in a live TV program. In this system, a TV editor who has heard a voice in the live program manually creates characters. Thus, even when an interval between timings of outputting voices and an interval between timings of displaying pieces of character information are roughly synchronized with each other, the display timing of a piece of character information is delayed overall relative to the output timing of a voice by a period of time required for manually inputting characters. In this respect, in the technology of Patent Literature 1, when a live TV program is recorded, the delay time is estimated based on a genre code of the TV program, and the display timing of a character is changed overall to an earlier time at the time of recording by a delay time that depends on the genre code.