The disclosed subject matter relates to systems and methods for mobile search, including mobile product search using Bag of Hash Bits (BoHB) and boundary reranking.
Mobile devices, including smartphones, can perform various types of mobile visual search, including location search, product search, augmented reality and the like. Among these, mobile product search can be utilized to identify products for sale based on a query image. For mobile product search, a local feature (LF) detector, like Scale-Invariant Feature Transform (SIFT) or Speeded Up Robust Feature (SURF) can be suitable, at least in part because some global feature detectors can be unable to perform object-level matching, which can be useful for product search.
Similar to conventional desktop visual searching, mobile visual search can benefit from efficient indexing and fast searching. Mobile visual search can also provide additional challenges, such as reducing the amount of data sent from the mobile device to the server, as well as utilizing relatively low computation and cheap memory on the mobile device.
Certain mobile visual search systems can use the mobile device only to perform capture and display. As such, the mobile device can send the query image in a compressed format, like JPEG, to the server, and additional processing, like local feature extraction and searching, can be performed by the server. As the computation capacity of smartphones increases, extracting local features on the mobile device can be performed at a suitable speed. As such, certain other mobile visual search systems can extract local features on the mobile device. Further, such local features can be compressed before transmission; otherwise, the raw local feature data can be greater than the query image data. If local features are compressed, for example, to tens of bits, and the compressed bits are sent to the server, the transmission cost and time can be reduced compared to sending the PEG images.
One approach to compressing local features can be to quantize each local feature to a visual word on the mobile side, and then send the visual words to the server. However, certain quantization methods, including quantization methods having a relatively large vocabulary, such as vocabulary tree, which can be utilized for improved search results, can be unsuitable for mobile devices due at least in part to the relatively low memory and computation capacity of mobile devices.
Another approach to mobile visual searching can be to compress the local features on the mobile side by some coding technique, such as Compressed Histogram of Gradients (CHoG), in which the raw features can be encoded using an entropy-based coding method, and can be decoded to approximately recover features at the server. The server can then quantize the recovered features to visual codewords and following a model, such as “bag of words” (BoW), can represent one image as collections of visual words contained in the image.