With the rapid development of digital cameras and scanners, digital photographs are becoming a commodity, especially since the cost of taking a digital photo is virtually free of charge. As a result, digital photographs can accumulate rapidly, and automated tools for organizing digital photographs have become extremely desirable. However, available commercial products are still lacking in effectively organizing digital photographs of individuals in everyday situations (e.g., photos in a family photo album). For example, traditional photo management systems only utilize the time or album/directory information which can be reliably extracted to help the user manage the photos. These types of information alone are insufficient to achieve good organization and search performance.
An intuitive way to organize digital photos is to annotate photos using semantics relevant to photographic content. For example, semantic keywords, such as who is in the photo, where and when the photo was taken, what happened, and what kind of photo it is (e.g., portrait, group, or scenery), may be used for photo organization. However, manually entering such semantic keywords is slow and tedious, given the large quantity of photos in most digital photo albums. On the other hand, due to technical difficulties in automatic recognition of face, scene and event, existing methods cannot guarantee an accurate result if the annotation process is fully automatic.
Many contemporary software programs still do annotation manually. These programs allow users to either label photos one by one, or manually select the photos that are believed to have the same label, and apply label to be photos upon selection. Other programs use a certain level of automation. For example, automation feature batch annotation or bulk annotation is widely adopted in commercial photo album management software. Batch annotation is often combined with drag-and-drop style user interfaces to further facilitate the annotation process.
Although the existing automation techniques and photo annotation user interfaces do reduce workload compared to annotating images one by one, users still have to manually select photos to put them in a batch before performing batch annotation. This is especially boring for name annotation (annotation using a name identifier of a person who is in a photo), because before each annotation, users need to recognize every person in the candidate photos and manually verify that the selected photos do contain a certain person. People may have less motivation to invest the effort in annotating their photos due to heavy workload of manual annotation. More likely they will just store their photos in hard disk without annotation.
There are also some existing methods that try to leverage the potential of Internet to alleviate the pain of tagging photos, by encouraging the Web users to label the photos online, or by implicitly labeling the keywords of photos in games. However, these Internet based annotation methods still require extensive labeling and tagging. Moreover, people are not always willing (or capable) to label photos of others, and some may be unwilling to share their family albums to the public over the Internet.
Some newer programs use the idea of time-based clustering to make the selection of photos automatically according to the times the photos were taken. This can be fairly easy to implement because digital photos tend to have meta-data including the time feature that can be easily extracted. Time-based clustering does help in organizing and browsing photos, but contributes little to annotating photos based on the photo content such as the people and other subjects of the photo.
In recent years, remarkable progress has been made in computer vision. Especially, the performance of automatic face detection method has improved significantly. This allows digital photo management programs to detect faces in photos and then enable the users to label these faces directly without first having to manually find them in the photo album. In these methods, however, each face still needs to be annotated one by one, thus requiring an amount of labor comparable to annotating directly on photos, except for the time saved for finding the photos with faces.