Modern-era mobile phones, handsets, tablets, mobile terminals, mobile devices, or user equipments have evolved into powerful image- and video-processing devices, equipped with high-resolution cameras, color displays, and hardware-accelerated graphics. With the explosive growth of mobile devices, like android, iPhone, mobile based multimedia visual services are enjoying intense innovation and development. Application scenarios of mobile visual search services can be location based services, logo search, and so on, where one image or multimedia sent from a mobile device is matched to another one stored in a database or an image repository. First deployments of mobile visual-search systems include Google Goggles, Nokia Point and Find, Kooaba, and Snaptell.
The image queries sent by mobile devices through a wireless network are usually computationally expensive, requiring prohibitively high communication cost, and cannot support real time operations. In popular applications where a mobile device captures a picture of certain objects and sends it as a query over a wireless network to search a large repository, reducing the bit rate while preserving the matching accuracy is a main concern and a main focus of the standardization effort under MPEG.
Visual descriptors or image descriptors are descriptions of the visual feature points of the contents in images and videos that produce such descriptions. They describe elementary characteristics such as the shape, the color, the texture, or the motion, among others, and they allow quicker and more efficient searches of the audio-visual content. The standardization organization that deals with audio-visual descriptors is the Motion Picture Expert Group-7 (MPEG-7).
However, for mobile devices, visual descriptors are usually still very heavy as they comprise of hundreds of scale and rotation invariant feature points, as well as their locations. Sometimes these scale invariant feature points may be larger than the image itself. An example feature point of scale-invariant feature transform (SIFT) comprises of 128 dimension with 2048 bits. Another example feature point of speeded up robust features (SURF) comprises of 64 dimension with 1024 bits.
Hence the need exists to find a model that reduces the size of representation of the feature points of an image, while preserving the matching performance for the queries comprising of the reduced feature point representations.