The present invention relates to digital images, and more particularly to a method and system for automatically categorizing, storing, and presenting the images using a speech-based command language on a web server and a digital camera.
As digital photography and the digitization of old photographs become more and more prevalent, the number of digital images that are stored and archived will increase dramatically. Whether the digital images are stored locally on a user""s PC or uploaded and stored on a Web photo-hosting site, the number of images will make it increasingly difficult for a user to find desired images.
To alleviate this problem, some digital cameras allow a user to categorize images according to a single subject category, such that when the images are downloaded to a host computer, the images having the same category are stored in the same file folder (U.S. Pat. No. 5,633,678- Electronic Still Camera For Capturing And Categorizing Images).
Although categorizing images with a single subject matter category is useful for very high-level sorting, for searching a large number of images and for more powerful searching, multiple categories are required. However, selecting and/or entering information for multiple categories on a digital camera would be cumbersome and tedious for the user.
One solution is to first upload the images from the digital camera to a PC, and then categorize the images on the PC using an image management application, such as PhotoSee Pro by ACD Systems, for example. Such image management applications typically display thumbnail images and allow the user to enter properties, such as caption, date, photographer, description, and keywords, for each thumbnail image. The user may then search the entire photo collection by entering desired properties.
Although programs such as PhotoSee Pro, and image database programs in general, allow the categorization of images using multiple categories, these programs have major drawbacks. One problem is that when categorizing the images, the user must retype the category information for each image. When categorizing a large amount of images, manually entering category information for each image is extremely tedious and time-consuming.
Another problem with uploading the images to a PC and categorizing the images on the PC is that the user must remember all the pertinent information for each image, which may not be an easy task, especially if a significant amount of time has past between capturing the images and categorizing them. A further problem is that all the category information entered for a series of images is generally only used for indexing. That is, it may be difficult for the user to view the category information when the images are presented for viewing and/or printing.
Accordingly, what is needed is an improved method for automatically categorizing, storing, and presenting digital images. The present invention addresses such a need.
The present invention provides a method for automatically storing and presenting digital images is disclosed. The method includes capturing digital images with a digital camera and storing the images in an image file, where the file includes at least one speech field and at least one text-based tag. A categorization process is then initiated whereby a user speaks at least one category voice annotation into the digital camera to categorize an image, and the category voice annotation is stored in the speech field of the corresponding image file. The category voice annotation is then translated into at a text annotation using voice recognition, and the image and the text annotation are automatically stored in a database. An album may then be dynamically created by retrieving selected images and corresponding text annotations from the database in response to a request from the user, and displaying each image on the album along with the text annotations.
According to the system and method disclosed herein, the present invention allows a user to categorize images at the time of capture with multiple categories of information by merely speaking into the camera. And since the user""s voice annotations are automatically recognized, translated, and stored in a database, the need for the user to manually enter categorization information is eliminated.