Recently, there has been considerable interest in augmented reality (AR), a state in which real-world and virtual realities are combined and which is interactive in real time. A commonly known example of AR is the yellow “first down” lines seen in television broadcasts of American football games. The real-world elements are the football field and players, and the virtual element is the yellow line, which is drawn over the image of the field by computers in real time. Similarly, rugby fields and cricket pitches are branded by their sponsors using augmented reality; giant logos are inserted onto the fields when viewed on television. In some current applications like in cars or airplanes, “heads-up” displays are integrated into the windshield to show information overlays on the real-world seen by the driver or pilot.
AR applications typically rely on image data present in backend databases for image detection and tracking. Image data typically comprises key points of raw images that have been pre-loaded into backend databases, although additional image data can be incrementally added to the database at any time, as well.
As mobile devices, such as cell phones and the like, have become more ubiquitous, and the functionality and features of such devices have increased, uses of AR are being realized to an ever increasing degree. One AR system that has been proposed uses a camera built into a mobile phone. The user takes a picture, which is wirelessly sent to a server that matches the picture with database images. The server then returns a database image to the phone, where it may be superimposed onto the original image. The database image may contain, for example, information regarding the features seen in the original camera image, such as building names, histories of structures or other items in view, or the like.
Since mobile AR implementations rely on receiving image data and performing image detection and matching against the camera view obtained on the mobile device locally, for best performance, the image data is typically loaded in the RAM of the mobile device for detection purposes. Limitations in the available RAM and wireless bandwidths present a challenge in the amount of image data that can be downloaded and used at any given time. In order to overcome some of the challenges, the location of the mobile device may be used to restrict the amount of image data downloaded and used at any given time. However, available systems may not provide a suitable combination of performance, bandwidth, and power useage.
One of the challenges in realizing an improved AR system is determining the location of the mobile device to enable the server to efficiently perform the necessary image processing. One way by which the location of the mobile device may be determined is by GPS position location, alone, or with the assistance of cellular base station location information. Systems that have been proposed use rough location determinations, for example, within one or two city blocks.
In one system, the mobile device is configured to prefetch data related to the current location of the mobile device, then the user takes a photograph of the location of interest using the camera associated with the mobile device. The image of the photograph is then matched with the prefetched data, and the result is displayed to the user.
To manage the large quantities of data, it has been proposed to organize a global geo-coordinate space using cell-based organization, with limitations on the number of images or key points considered within each AR cell. (The term “AR cell” in this context is not the same as the term “cell” in the context of a cellular phone system.) An AR cell is termed a “loxel,” indicating a location based pixel storage model. A loxel is typically associated with a particular location and spans a particular area (usually defined in rectilinear coordinates for simplicity). A kernel refers to an area generally visible from a particular loxel and is usually defined as spanning particular loxels or a configuration of loxels.
Depending on the location of the user, image data corresponding to a kernel area of 3×3 loxels is sent to the client, with the center of the kernel being the present loxel of the user. As the user enters a new loxel, additional image data is sent corresponding to the new kernel area. The loxel size that has been used is 30 meters by 30 meters. Although this technique makes a significant reduction in the amount of image data that needs to be sent to the user at any given time, it still has a few shortcomings.
In some systems, the number of incremental loxels that have to be downloaded at any time has been 3, taking into account only the motion of the mobile device along 4 major directions. In reality, there are 8 adjacent loxels to any given loxel. Hence, a movement into one of the adjacent loxels may need the data of up to 5 incremental loxels to be downloaded. Further, it is not always the case that the next valid use of the AR application occurs in an adjacent loxel. Depending on the application and the mobility of the user, the next valid loxel may be a non-adjacent loxel, in which case, more data needs to be downloaded.
The assumption has also been made that the download need only include incremental image data after the user has entered a new loxel. This can lead to suboptimal latency for fetching the image data, in turn causing an undesirable delay in the application calling for the data. This is especially true in cases where the next valid loxel is a non-adjacent one to the current loxel.
The assumption has also been made that a 360 degree field of view is of interest, with the focal point lying no further away than one of the adjacent loxels. While a 360 degree field of view may be necessary for panning when an AR application is in use, not all AR applications may need it. Furthermore, given that certain AR applications only have an intermittent usage pattern, downloading data for a 360 degree field of view may be overkill in some situations. A cell phone camera has at most a 70 degree field of view, a typical field of view being about 55 degrees. This puts the visible area for a static camera view within just one loxel. Depending on the size of the loxel, the camera view may include one or more adjacent loxels, but, on average, the camera view is still only 25% or less of what is assumed in a 360 degree field of view.
Typical systems proposed in the past do not provide a means for a mobile device to automatically detect when it arrives at a new loxel. By default, this requires the mobile device to continuously update the server with its new location, allowing the server to determine when to send the new data set. This approach is power consuming, especially if the AR applications were designed to run in the background for prolonged periods of time. Further, although there are multiple applications that do this, it presents privacy issues for the end consumer.
Another AR system that has been proposed is that of potentially visible sets (PVS). The concept of potentially visible sets has been known in computer vision for a long time. PVS was designed to take obstacles into consideration to determine the set of visible objects to a camera or human eye. The technique requires good training data on the obstacles to be available and a somewhat precise position determination and orientation of the camera relative to the obstacles in order to arrive at the visibility of a particular object. Hence, in addition to image data, PVS also requires data on the relative positioning of the camera and the obstacle with respect to the image in order to determine the potentially visible set of objects or images.
Efforts have been made to apply the use of PVS to augmented reality environments in lieu of pure cell-based organization of images. While the notion of obstacles is much more applicable in indoor environments (e.g., walls, visibility from inside a room, etc.), it does not as easily map to an outdoor environments in which obtaining training data for obstacles is often impractical. Further, arriving at a precise PVS for a given location and orientation of a camera is very complex and may require a not-so-trivial amount of data to be attached to key points in order to get acceptable matching rates. Hence, the use of PVS may not help the mobile outdoor AR uses in any significant manner when compared to a cell-based organization.