The widespread use of mobile devices equipped with high-resolution cameras is increasingly pushing computer vision applications within mobile scenarios. The common paradigm is represented by a user taking a picture of the surroundings with a mobile device to obtain informative feedback on the surroundings. This is the case, for example, in mobile shopping applications where a user can shop just by taking pictures of desired products, or in landmark recognition applications for ease of visiting places of interest. In the aforementioned scenarios visual search needs to be typically performed over a large image database, where applications on the mobile device communicate wirelessly with a remote server to send visual information and receive informative feedback. As a result, a constraint is set forth by the bandwidth of the communication channel including this wireless communication because the communication channel ought to be carefully optimized to bound communication costs and network latency. For this reason, a compact but informative image representation is sent remotely, typically in the form of a set of local feature descriptors, such as scale-invariant feature transform (SIFT) and speeded up robust features (SURF) feature descriptors, which are extracted from the captured image.
Despite the summarization of image content into local feature descriptors, in at least some applications the size of state-of-the-art feature descriptors cannot meet bandwidth requirements of the communications networks over which these descriptors must be communicated and the desired visual search performed.