1. Field of the Invention
The present invention relates to a processing technique for processing image data captured using an image capturing apparatus which can input audio data.
2. Description of the Related Art
In recent years, the progress of digital cameras as image capturing apparatuses is amazing. For example, digital cameras, which have an audio function that can input, process, and output audio data, also have appeared.
As an audio function in a digital camera, for example, a voice memo function is known. With this function, a microphone is connected, and a memo of audio data input by the user via the microphone is appended to captured image data.
As another audio function, an audio shutter function described in Japanese Patent Laid-Open No. 2001-305642 is available. The audio shutter function automatically activates a shutter when a digital camera recognizes a specific user's utterance such as “Say cheese!” or “Smile!”. This function has already been adopted in actual products as a function effective for a case in which a photographer cannot access a shutter button since the photographer himself or herself is an object, or a case in which a photographer wants to prevent hand-shaking upon pressing the shutter button, and so forth.
Furthermore, with the advent of digital cameras with such audio functions, a function of processing image data captured by an image capturing apparatus using corresponding audio data is also available.
In general, many users upload captured image data to apparatuses such as a personal computer (to be abbreviated as a PC hereinafter), Set Top Box (to be abbreviated as an STB hereinafter), and the like, and then execute a browse process, edit process, print process, and so forth of the data. For this reason, the function of processing image data using audio data is often implemented on such apparatuses.
More specifically, Japanese Patent Laid-Open No. 2006-164229 and Japanese Patent Laid-Open No. 2005-12674 disclose a technique which outputs specific audio data (predetermined BGM or voice memo) when image data captured by an image capturing apparatus are fetched into a PC, and are played back as a slideshow.
Also, a technique which identifies a speaker using a voice memo appended to image data upon uploading image data captured by an image capturing apparatus to a PC or STB, and records the identification result as photographer information in association with the image data is known. With this technique, uploaded image data can be searched based on the photographer information.
However, in order to process image data using a voice memo appended to the captured image data, the user needs to append voice memos to respective image data in advance, resulting in inconvenience.
Normally, since a voice memo to be appended is input after capturing of image data, if it is output upon playing back image data in a slideshow, the sense of reality is poor.