There is a trend of increasing electronics miniaturization, leading to the development of devices that may have more processing power, allowing them to become smarter. Furthermore, the trend of further technology integration will allow devices to integrate more and more technologies such as wireless networking and sensor capabilities into affordable products. The combination of these two trends will allow devices to become smart device, context aware and intelligently interacting with other such devices in a network (ad-hoc, fixed or otherwise). Such devices can be portable as well as stationary devices. Portable and stationary device makers tend to differentiate their device products, in both form and function, from other such products in the market. This will leave the user with even more such devices at home and on-the-move.
One use of such powerful devices is for the storage and/or rendering of personal content. In using such devices metadata of personal content, such as own created photos and videos, is important to users to be able to, for instance, easily organize, browse and find back their content. To manually annotate such content is a very laborious task for users. This is especially so since the amount of content produced, both commercially and personally, is ever increasing. Therefore, it is becoming virtually impossible to properly annotate all newly created content. Solutions are required that alleviate the users from such arduous tasks and enable them to start enjoying the content.
Whilst a lot of solutions, using content analysis or otherwise, are being developed for the purpose of helping the user to annotate content automatically. None are satisfactory. When considering personal content the following types of metadata are generally found to be important:
Why was the content created? What is the “event”, e.g. Summer holidays
Who appears in the created picture or video? E.g. my wife
When was the content created? E.g. July, Summer
Where was the content created? E.g. In Italy
Further, types of metadata related to concepts and object present in the content, such as “happy”, “beach” and “tree” can also be of importance to the user.
Concerning the recognition of who appears in photos and videos a lot of literature is available; see for instance Marc Davis, Michael Smith, John Canny, Nathan Good, Simon King, and Rajkumar Janakiraman, “Towards Context-Aware Face Recognition,” Proceedings of 13th Annual ACM International Conference on Multimedia (MM 2005) in Singapore, ACM Press, 483-486, 2005. This article specifically targets context-aware face recognition in personal photos created using mobile phones. A further example is provided in Ara V. Nefian, Monson H. Hayes III, 1999, “Face recognition using an Embedded HMM”, which is a face recognition method.
To determine where a content item was created, at creation time, it is widely known that a Global Positioning System (GPS) can be used. Further, there are also systems developed that try to analyse the content created, to infer where the place is captured by the content. For instance, in Risto Sarvas, Erick Herrarte, Anita Wilhelm, and Marc Davis, “Metadata Creation System for Mobile Images,” Proceedings of the Second International Conference on Mobile Systems, Applications, and Services (MobiSys2004) in Boston, Mass., ACM Press, 36-48, 2004, a created image may be uploaded to a server to be compared with other images. From such an analysis it can be derived, for instance, that an image was taken at the “Campanile” tower on the UC Berkeley campus, USA.
Furthermore, there also many efforts to detect concepts and objects, see for instance, Erik Murphy-Chutorianl, Sarah Aboutalib, Jochen Triesch, “Analysis of a Biologically-Inspired System for Real-time Object Recognition”, Cognitive Science Online, Vol. 3.2, pp. 1-14, 2005 and I. Cohen, N. Sebe, A. Garg, M. S. Lew, T. S. Huang, “Facial Expression Recognition from Video Sequences”, IEEE International Conference on Multimedia and Expo (ICME'02), vol II, pp. 121-124, Lausanne, Switzerland, August 2002.
However, even given all of the work being done in content analysis it has been found that content analysis cannot provide 100% accurate annotation results. Whilst there are also efforts to incorporate user feedback and learning algorithms it remains an issue that a user will be required to provide significant amounts of feedback.
The inventors recognising this problem devised the present invention.