As an example of a technology to support presentation, an information processing method to associate image data to sound data is proposed.
In this information processing method, a character region is detected in the image data and a character is recognized in the character region in order to associate the image data to the sound data. On the other hand, in the information processing method, a sound section is detected in sound data and sound is recognized. Then, in the information processing method, a character and sound are associated to each other by comparison/collation between a character string of the recognized character and a character string which is converted from the sound or a phonetic string which is converted from the recognized character and a phonetic string of the sound. Then, in the information processing method, a frame is assigned to a corresponding part of a still image, which part corresponds to a sound section, and a display is performed.
Moreover, in the information processing method, based on a recognition probability or the like of a candidate of character information and a recognition probability of a candidate of sound information, the candidate of the character information or the candidate of the sound information is weighted and a degree of correlation between the candidates is calculated. Then, based on the degree of correlation, the candidate of the character information and the candidate of the sound information are associated to each other.
Patent document 1: Japanese Laid-open Patent Publication No. 2004-7358
Patent document 2: Japanese Laid-open Patent Publication No. 2005-150841
Patent document 3: Japanese Laid-open Patent Publication No. 6-223104
Patent document 4: Japanese Laid-open Patent Publication No. 2005-173109
However, in the above-described technology, there is a case where a highlight display flaps due to false recognition.
That is, in the information processing method, sound recognition is used to associate image data to sound data. However, there is a limit in accuracy in the sound recognition as a matter of course. When false recognition is made, there is a case where a highlight display flaps due to a repetition of the highlight display in which repetition going back to a part described by a presenter is performed after the highlight display is moved away from the part described by the presenter. Even when weight inversely-proportional to a distance from a highlighted part is assigned to a result of sound recognition in order to control this flapping, a trouble is generated in a case where a part described by the presenter moves to a far part. For example, there is a case where movement of a highlight display is delayed greatly or a part described by the presenter is not determined to be a moving destination of the highlight display.