For example, with advancement in natural language processing technology such as artificial intelligence (AI), voices recorded in a video can be converted into a text with a high degree of accuracy. The text obtained by conversion can then be used as subtitles of the video. However, the subtitles that are based on the text generated as a result of natural language processing have a lower degree of readability as compared to subtitles produced by a person. Thus, in that regard, there is room for improvement.
There is a known technology related to a subtitle generation device which generates subtitles that reduce sense of discomfort for users (for example, see Japanese Laid-open Patent Publication No. 2015-018079). In this technology, the sense of discomfort for the users is reduced by reflecting style of speaking of a person in the subtitles.
The voices recorded in a video include words that are frequently seen or heard and words that are hardly seen or heard or that are seen or heard for the first time. In a case of making subtitles for the words that are frequently seen or heard, the degree of readability is considered to be high. On the other hand, in a case of making subtitles for the words that are hardly seen or heard or that are seen or heard for the first time, the degree of readability is considered to be low. In this way, there is room for improvement of the readability of the subtitles.