Visual communication services with respect to mobile communication devices are gaining increasing importance. One of such services is the so-called visual search, wherein by using a camera (e.g. integrated within a mobile phone or any suitable terminal), an image of a physical object is captured and corresponding data is sent to the network that recognizes the image by means of suitable computer algorithms and returns useful information back to the user about the physical object.
The aim of visual search is primarily to identify the physical object and thereby present the user with associated information. This information, also being referred to as metadata, might be of various formats e.g. video files, audio files, web pages, images animation files etc.
When the input data to an algorithm is too large to be processed and it is suspected to be notoriously redundant (much data, but not much information), the input data will be transformed into a reduced representation set of features (also named features vector). Transforming the input data into the set of features is called feature extraction. If the features extracted are carefully chosen it is expected that the features set will extract the relevant information from the input data in order to perform the desired task using this reduced representation instead of the full size input.
Most current visual search systems adopt the so-called feature based image matching approach. By representing images or objects using sets of local features, recognition can be achieved by matching features between the query image and candidate database image. Fast large-scale image matching is enabled using so-called vocabulary trees (VT). Features are extracted from the database of images and a hierarchical clustering algorithm is applied to all of these features to generate the VT. Descriptors of the query image are also classified through the VT and a histogram of the node visits on the tree nodes is generated.
Candidate images are then sorted according to the similarity of the candidate database image histogram and a query image histogram. Image capture and feature manipulations are proposed to be performed in the mobile terminal, while VT and GV are performed on a server in the Internet.
Augmented reality (AR) is an upcoming paradigm of presenting metadata of physical objects as an overlay over the image or video of a physical object in real time. Special applications called augmented reality browsers (AR browsers) are used in terminals e.g. mobile phones and these are gaining popularity. The AR browsers perform two main functions; visual search initiation and overlay display of metadata on the end user terminal display. The AR server incorporates elements of visual search and of an overlay object server. The visual search component performs the matching of an image to the dataset and the file server performs the function of sending the corresponding overlay data to the AR browser for displaying to the end user. It should be noted that the overlay data could range from simple text to a complex webpage containing text, audio and video components. Also it may be possible for the end user to further interact with the overlay data displayed e.g. start/stop video, scroll text, enlarge image etc. Overlay data is also called metadata of the physical object and this is the term that will be used in this document. Businesses could take advantage of AR in a multitude of ways, such as personalized shopping blending location dependent information and blended branding.
Execution of the visual search algorithms in known visual search solutions is performed in the Internet, i.e. beyond the mobile network. However, communication with to Internet and execution within the internet might involve (unpredictable) transmission delays such that a reply to the user will be (unpredictably) delayed, thus adversely affecting the quality of experience (QoE). This might lead to unsatisfied users restraining from using such service, and consequently makes it difficult for visual search and AR providers to deploy such services.