1. Field of the Invention
The present invention relates to an information processing method and apparatus for associating image data and audio data.
2. Description of the Related Art
Recently, technologies for associating image data and audio data, e.g., taking a still image using a digital camera and recording comments, etc., for the taken still image using an audio memo function, have been developed. For example, a standard format for digital camera image files, called Exif (EXchangeable Image File Format), allows audio data to be associated as additional information into one still image file. The audio data associated with a still image is not merely appended to the still image, but can be recognized and converted into text information by speech recognition, so that a search for a desired still image can be performed for a plurality of still images using text or audio as a key.
Digital cameras having a voice recorder function or voice recorders having a digital camera function are capable of recording a maximum of several hours of audio data.
In the related art, although one or a plurality of audio data can only be associated with the entirety of a single still image, a particular portion of a single still image cannot be associated with a corresponding portion of audio data. The present inventor has not found a technique for associating a portion of a still image taken by a digital camera with a portion of audio data recorded by a voice recorder.
For example, a presenter gives a presentation of a product using a panel in an exhibition hall. In the presentation, the audience may record the speech of the presenter using a voice recorder and may also take still images of posters exhibited (e.g., the entirety of the posters) using a digital camera. After the presentation, a member of the audience may play back the still images and speech, which were taken and recorded in the presentation, at home, and may listen to the presentation relating to a portion of the taken still images (e.g., the presentation relating to “product features” given in a portion of the exhibited poster).
In this case, the member of the audience must expend some effort to search the recorded audio data for the audio recording of the desired portion, which is time-consuming. A listener who was not a live audience member of the presentation would not know which portion of the recorded audio data corresponds to the presentation of the desired portion of the taken posters, and must therefore listen to the audio recording from the beginning in order to search for the desired speech portion, which is time-consuming.