1. Field of the Invention
The present invention relates to a device and method for managing image data on a computer connected to an image pickup apparatus via a network or the like.
2. Description of the Related Art
Digital cameras have become widely used these days. The user generally manages digital images captured by a portable image-pickup apparatus, such as a digital camera, on a PC or server. For example, captured images can be organized in folders on a PC or server, and specific images can be printed or inserted into greeting cards. Moreover, some images, if managed on the server, can be available to the other users.
In such cases, the user needs to search for a desired image. If the number of images to be searched is small, all images can be displayed as thumbnails, which allow the user to easily browse and find a desired image. However, if hundreds of images have to be searched or if a group of target images is divided and stored in different folders, the user cannot easily find a desired image just by browsing.
One of the solutions is to add voice annotations, for use in retrieval, to images on the image pickup apparatus. For example, if an image of a mountain is captured, a voice annotation, such as “hakone-no-yama” (meaning a mountain in Hakone (geographic name)), is added to the image. This voice data is paired with the data of the captured image and stored in the image pickup apparatus. Then, the voice data is subjected to speech recognition in the image pickup apparatus or on a PC to which the image is uploaded, and is converted into text data. Once annotation data is converted into text data, the image can be found by keywords, such as “yama” (meaning mountain) and “hakone”, using a typical text search method.
Some techniques using such voice annotations are disclosed in Japanese Patent Laid-Open No. 2003-219327, Japanese Patent Laid-Open No. 2002-325225, and Japanese Patent Laid-Open No. 9-135417. In these techniques, the user adds a voice annotation to an image, during or after image capturing. Then, using known speech recognition techniques, the user uses the voice data in image retrieval.
Since execution of speech recognition results in an extremely heavy processing load, it is not realistic to execute speech recognition in currently available portable image pickup apparatuses. Therefore, it is desirable that an image and voice data added to the image be uploaded from an image pickup apparatus to a PC or server such that speech recognition can be executed thereon.
As described above, there are proposed and implemented techniques in which an image pickup apparatus only performs the acquisition of voice annotations, and speech recognition is executed on a PC or server to which image data and voice data are uploaded. However, as for the timing of speech recognition performed on voice annotation data added to images captured, there is no clear description, or speech recognition is performed in response to a request from the user upon completion of uploading image data and voice data.
It is thus cumbersome for the user to go through a process of performing the “uploading of images” and giving a “speech recognition order”.