Rapid progress in the development of hand-held portable devices such as smart phones, palmtop computers, portable media players, personal-digital-assistant (PDA) devices and the like, has led to proposed inclusion of novel features and applications involving image processing. In such an application, namely image annotation or captioning, a user points a portable device towards a scene, e.g. an alpine landscape, a building, or a painting in a museum, and the display shows the image together with superposed information concerning the scene. Such information can include names, e.g. for mountains and habitations, historical information for buildings, and commercial information such as advertising, e.g. a restaurant menu.
Annotation information can be supplied to portable devices by servers in a wireless communication network. A corresponding functional configuration of a communication network with servers and portable devices here will be designated as an annotation system.
In an annotation system, specific concerns are with precision and robustness of annotation positions. Precision can be enhanced by simultaneous use of different techniques such as sensor- and image-based techniques, and robustness by choice of techniques for determining annotation positions. Once an image is acquired by a mobile device, different techniques can be used to determine the placement of annotations in the image. Example of methods and systems using different techniques for annotating an image are described among other in U.S. Pat. No. 6,208,353 and in EP1246080.
EP1622081 describes a video object recognition device for recognizing an object contained in a video image and for annotating this object. Candidate searching means reads the positional information of the object recognizing device and of geographical candidate objects stored in a database. This device then searches for geographical objects that have possibly been imaged and performs a visual comparison between those candidate objects and the image. Presence probability calculating means calculate the probability that an image of the candidate object is captured, and similarity calculating means calculate the similarity between the candidate object and a visual feature of the video image. The presence probability and the similarity are then used to determine whether an image of an object is captured or not. This method is useful for determining whether or not a particular object should be annotated, but does not indicate the most likely position of the salient point, or the position of the image where the annotation should be added.
WO05114476 describes a mobile image-based information retrieval system including a mobile telephone and a remote recognition server. In this system, the image taken with the camera of the mobile phone is transmitted to a remote server where the recognition process is performed. This leads to high bandwidth needs for transmitting the image, and to a delay for computing the annotations in the server and transferring them back to the mobile phone. Again, this system delivers a similarity score which is compared to a predefined threshold to decide whether or not an object is visible in the image, but does not indicate the most likely position of this object in the image.
WO2007/108200 describes a camera and an image processing program for inserting an inserting-image at an appropriate position of an image. It is concerned with identifying in a scene important objects that should not be obscured by an added annotation. The image plane is divided into 25 (5×5) small areas. The positioning of this inserting-image is related to an object distribution evaluation value calculated by the CPU of the camera, by using a face distribution evaluation value, a contrast distribution evaluation value and weights. The inserting position is selected among the small areas of the image plane included in the first row and the fifth row having the minimum object distribution evaluation value.
A system for automated annotation of images and videos points a mobile device towards an object of interest, such as a building or landscape scenery, for the device to display an image of the scene with an annotation for the object. An annotation can include names, historical information, and links to databases of images, videos, and audio files. Different techniques can be used for determining positional placement of annotations, and, by using multiple techniques, positioning can be made more precise and reliable. The level of detail of annotation information can be adjusted according to the precision of the techniques used. Required computations can be distributed in an annotation system including mobile devices, servers and an inter-connecting network, allowing for tailoring of annotated images to mobile devices of different levels of complexity. A trade-off can be taken into account between precision of annotation and communication cost, delay and/or power consumption. An annotation database can be updated in a self-organizing way. Public information as available on the web can be converted to annotation data.