People are increasingly interacting with computers and other electronic devices in new and interesting ways. For example, visual recognition and augmented reality systems have become more ubiquitous on mobile devices. A user can use their device to recognize and obtain information about objects in the field of view of a camera on the device, such as bar codes, product covers, text, among other objects. In doing so, the user will generally attempt to center the object in the field of view of the camera. The user may additionally attempt to account for lighting and other factors to improve accuracy of the visual recognition. After the object is recognized, a graphical overlay can be rendered on a display element of the device so that the user can see virtual information augmenting the on-screen imagery. Conventional approaches, however, inherently block the user's view of the object. Further, in many situations, there is visible lag in the process (i.e., capturing an image, visual recognition, and graphics rendering). Further still, the process is generally a single-user experience on a small screen, where multiple users have to squeeze in around the device to view the device's screen.