With the use of computer vision technologies and the increase of computing power, many user systems are now configured to clients that can quickly retrieve information or content related to an object in the real world simply by capturing and processing a digital image of the object. Instead of inputting a text query to obtain information about the object a user is looking at, the user is now able to point a camera of a mobile device to “scan” the object and retrieve relevant information about the object. These clients analyze the image frame(s) captured by the camera, and the relevant information is provided back to the user. For instance, the user system may detect and recognize that the object or an element in the real world is present in the image frame. The system then retrieves and display information that is contextually salient to the detected and recognized object.