Visual search is a concept where by using a camera (e.g. in a mobile phone) an image of a physical object is captured and recognized by computer algorithms and useful information is presented back to the user about the physical object.
The aim of visual search is primarily to identify the physical object and thereby present the user with some information. This information is called metadata, and it could be of various formats e.g. video files, audio files, Web pages, images animation files etc.
When the input data to an algorithm is too large to be processed and it is suspected to be notoriously redundant (much data, but not much information) then the input data will be transformed into a reduced representation set of features (also named features vector). Transforming the input data into the set of features is called feature extraction. If the features extracted are carefully chosen it is expected that the features set will extract the relevant information from the input data in order to perform the desired task using this reduced representation instead of the full size input.
Most current visual search systems adopt the feature based image matching approach [see e.g. G. Takacs et al “outdoors augmented reality on mobile phone using loxel-based visual feature organization” in ACM International Conference on Multimedia Information Retrieval, Vancouver, Canada, October 2008]. By representing images or objects using sets of local features, recognition can be achieved by matching features between the query image and candidate database image. Fast large-scale image matching is enabled using a Vocabulary Tree (VT). Features are extracted from the database of images and a hierarchical k-means clustering algorithm is applied to all of these features to generate the VT. Descriptors of the query image are also classified through the VT and a histogram of the node visits on the tree nodes is generated.
Candidate images are then sorted according to the similarity of the candidate database image histogram and a query image histogram Geometric Verification (GV) is applied after feature matching [see S. S. Tsai, D. Chen, J Singh, and B. Girod, “Rate-efficient, real-timer CD cover recognition on a camera-phone” in ACM international Conference on Multimedia, Vancouver, Canada, October 2008] to eliminate false feature matches. In this process, features of the query object are matched with features of the database objects using nearest descriptor or the ratio test. Then, a geometric transformation of the location of the features in the query object and the locations of the features in the database object is estimated using RANdom SAmple Consensus (RANSAC) algorithm [see M. Fiscler and R. Bolles, “Random sample consensus: a paragigm for model fitting with applications to image analysis and automated cryptography” Communications of ACM, vol. 24, no. 1, pp. 381-395, 1981].
Image capture and feature manipulations are proposed to be performed in the mobile terminal, while VT and GV are performed on a server in the Internet.
Augmented reality (AR) is an upcoming paradigm of presenting metadata of physical objects as an overlay over the image or video of a physical object in real time. Special applications called augmented reality browsers (AR browsers) are used in terminals e.g. mobile phones and these are gaining popularity. The AR browsers perform two main functions; visual search initiation and overlay display of metadata on the end user terminal display. The AR server incorporates elements of visual search and of an overlay object server. The visual search component performs the matching of an image to the dataset and the file server performs the function of sending the corresponding overlay data to the AR browser for displaying to the end user. It should be noted that the overlay data could range from simple text to a complex webpage containing text, audio and video components. Also it may be possible for the end user to further interact with the overlay data displayed e.g. start/stop video, scroll text, enlarge image etc. Overlay data is also called metadata of the physical object and this is the term that will be used in this document. Businesses could take advantage of AR in a multitude of ways, such as                Personalized shopping: Walking around stores made relevant with the ability to opt in personalization and targeting. Here information of potential customers can be delivered by scanning stores, streets or shelves for discounted or personally relevant products.        Location layers: Blended guides to new places, tourism, enhanced traveling or themed space.        Blended branding: The equivalent of virtual poster ads.        
In all case there is an entity which uses Augmented Reality AR to deliver a service or enhanced experience to an end user. The end user is interacting with a physical object of the entity. This entity will be called the service provider (SP) in this document.
FIG. 1 belongs to prior art and discloses a system comprising an Internet network 12 attached to a mobile network 11. A mobile phone 1 within a cell area of a Radio Base Station RBS 2 is able to communicate via intermediate nodes 2, 3 and 4 with a Server 5 in the Internet network. The intermediate nodes in the prior art example is the RBS 2, a Radio Network Controller RNC 3 and a Gateway GPRS Support Node GGSN 4. The server 5 comprises a first cache (i.e. storage) 5a of images of features, a second cache 5b of metadata, and a processor unit 5c. The Radio Base Stations 2 is attached to a Service provider SP. The service provider in this example is library; The Service Provider comprises different objects of which photos can be captured by a camera, for example photos of books. In this prior art example an image of a physical item is captured by a camera of the mobile phone. The image is then forwarded to the server 5 with a request for metadata related to the image. After a successful matching between image and cached features performed in the server by the processor, metadata is found and sent from the server 5 to the mobile phone 1 and presented on the phone for example as augmented information.
Visual search for augmented reality will cause a shift in the direction of content flow in operator networks. Today the content usually flows (as in a download scenario) from the internet via the access network to the terminal. With visual search, the flow of content will be reversed; from the terminal to the internet via the operator's network.
A new traffic flow is hereby introduced on top of the existing one; upload of images and the return of the matched data (important to note is that these flows could be delay sensitive). Hence the new traffic flows adds extra burden to the network. In this light, the operators would like
(a) To control the traffic flow with aim of reducing network utilization
(b) Monetize this new traffic flow
With the current mobile network architecture there is no means for the operator to perform either (a) or (b) for augmented reality applications and this invention aims to propose a solution for this problem.
Another major problem with existing visual search solutions is that they usually execute the visual search algorithms in the Internet. A visual search system require that the image of feature is sent to the Internet, if there is transmission issue in the Internet the reply will be delayed and this would adversely affect the Quality of Services QoE of using visual search systems and the user will be forced to wait for a long time for the result and might eventually abandon the service.