Modern portable electronic devices are becoming increasingly powerful and sophisticated. Not only are devices running faster CPUs, they're also equipped with sensors that are making these devices more versatile than traditional personal computers. The use of GPS, gyroscopes, accelerometers have made these devices location aware, and opened up a world of possible applications that did not seem possible before.
The standard definition of augmented reality is live direct or indirect viewing of a physical real-world environment whose elements are augmented by virtual computer-generated imagery. Traditionally augmented reality applications have been limited to expensive custom setups used in universities and academia, but with the advent of modern smartphones and powerful embedded processors, many of the algorithms that were once confined to the personal computer world are becoming a part of the mobile world. Layar and AroundMe are examples of two such applications that are increasingly popular and have been ported to many smartphones (Layar is a product of the company Layar, of the Netherlands, and AroundMe is a product of the company Tweakersoft). Both the Layar and AroundMe applications use location data obtained from GPS sensors to overlay additional information such as direction and distance of nearby landmarks.
Typically, augmented reality implementations have relied on three elemental technologies:
(1) Sensing technologies to identify locations or sites in real space using markers, image recognition algorithms, and sensors.
(2) Information retrieval and overlay technologies to create virtual information and to overlay it on top of live images captured by the camera.
(3) Display technologies capable of integrating real and virtual information which includes mobile phone display, projectors, as well as augmented reality glasses.
In addition, mobile augmented reality techniques are roughly classified into two types based on the type of sensing technology used.
A. Location Based Augmented Reality
Location based augmented reality techniques determine the location or orientation of a device using GPS or other sensor, then overlay the camera display with information relevant to the place or direction. The four common sensor platforms used are described below:                GPS: The Global Positioning System provides worldwide coverage and measures the user's 3D position, typically within 30 meters for regular GPS, and about 3 meters for differential GPS. It does not measure orientation. One of the major drawbacks of using GPS based systems is that they require direct line-of-sight views to the satellites and are commonly blocked in urban areas, canyons, etc. This limits their usability severely.        Inertial, geomagnetic, and dead reckoning: Inertial sensors are sourceless and relatively immune to environmental disturbances. Their main drawback however is that they accumulate drift over a period of time. The key to using inertial sensors therefore lies in developing efficient filtering and correction algorithm that can compensate for this drift error.        Active sources: For indoor virtual environments, a common approach is the use of active transmitters and receivers (using magnetic, optical, or ultrasonic technologies). The obvious disadvantage of these systems is that modifying the environment in this manner outdoors is usually not practical and restricts the user to the location of the active sources.        Passive optical: This method relies on using video or optical sensors to track the sun, stars, or surrounding environment, to determine a frame of reference. However most augmented reality applications refrain from using these algorithms since they are computationally intensive.        
B) Vision Based Augmented Reality
Vision based augmented reality techniques attempt to model precise descriptions of the shape and location of the real objects in the environment using image processing techniques or predefined markers, and use the information obtained to align the virtual graphical overlay. These techniques may be subdivided into two main categories.                Marker Based Augmented Reality: Marker based augmented reality systems involve recognition of a particular marker called an augmented reality marker with a camera, and then overlaying information on the display that matches the marker. These markers are usually simple monochrome markers and may be detected fairly easily using less complex image processing algorithms.        Markerless augmented reality: Markerless based augmented reality systems recognize a location or an object not by augmented reality markers but by image feature analysis, then combine information with the live image captured by the camera. A well-known example of this image tracking approach is Parallel Tracking and Mapping (PTAM) developed by Oxford University and Speeded Up Robust Features (SURF) which has been recently used by Nokia Research.        
Even though these techniques have been deployed and used extensively in the mobile space, there are still several technical challenges that need to be addressed for a robust, usable augmented reality system.
There are three main challenges discussed hereafter:
I. Existing Mobile Rendering APIs are not Optimal
Existing Mobile 3D solutions are cumbersome and impose limitations on seamless integration with live camera imagery. For complete integration between live camera and overlaid information, the graphics overlay needs to be transformed and rendered in real-time based on the user's position, orientation, and heading. The accuracy of the rendering is important since augmented reality applications offer a rich user experience by precisely registering and orienting overlaid information with elements in user's surroundings. Precise overlay of graphical information over a camera image creates a more intuitive presentation. User experience therefore degrades quickly when accuracy is lost. There have been several implementations that have achieved fast rendering by using OpenGL, or by remote rendering the information and streaming the video to mobile embedded devices. Most modern smartphones have graphics libraries such as OpenGL that use the inbuilt GPU to offload the more computationally expensive rendering operations so that other CPU intensive tasks such as the loading of points of interest are not blocked. However the use of OpenGL on smartphone platforms introduces other challenges. One of the biggest disadvantages of using OpenGL is that once perspective-rendered content is displayed onscreen, it is hard to perform hit testing because OpenGL ES1.1 does not provide APIs for “picking mode” or “selection” used to determine the geometry at particular screen coordinates. When controls are rendered in a perspective view, it is hard to determine whether touch events lie within the control bounds. Therefore, even though OpenGL supports perspective 3D rendering under the processing constraints typical of modern mobile smartphones, it is not optimal.
II. Real-Time Marker/Markerless Systems are Too Complex
Real-time detection and registration of a frame reference is computationally expensive, especially for markerless techniques. Mapping a virtual environment onto the real-world coordinate space requires complex algorithms. To create a compelling experience, the virtual viewport must update quickly to reflect changes in the camera's orientation, heading, and perspective as the user moves the camera. This makes it essential to gather information about the device's physical position in the environment in real-time. Traditional techniques for frame of reference estimation depend on identifiable markers embedded in the environment or computationally-intensive image processing algorithms to extract registration features. Most of these image processing techniques need to be optimized extensively to fit within the hardware constraints imposed by mobile devices. For closed environments where markers may be placed beforehand, the use of identifiable markers for detection and frame of reference estimation is usually the best viable option. This approach, however, is less suitable for augmented reality applications in outdoor environments since setting up the environment with markers prior to the application's use is unlikely. Attempts to perform real time natural feature detection and tracking on modern mobile devices have been largely intractable since they use large amounts of cached data and significant processing power.
III. Sensor Data for Location Based Systems is Inaccurate
For location based augmented reality systems, especially GPS based systems, sensor noise makes orientation estimation difficult. Modern mobile smartphones contain a number of sensors that are applicable for augmented reality applications. For example, cameras are ubiquitous and accelerometers and geomagnetic sensors are available in most smartphones. Geomagnetic and gyroscope sensors provide information about users headings and angular rate which may be combined with GPS data to estimate field of view and location. However these sensors present unique problems, as they do not provide highly accurate readings and are sensitive to noise. To map the virtual augmented reality environment into a real-world coordinate space, sensor data must be accurate and free of noise that may cause jittering in rendered overlays. The reduction of noise thus represents a significant challenge confronting augmented reality software.
This patent application provides viable approaches to solve these challenges and present a practical implementation of those techniques on a mobile phone. A new methodology for localizing, tagging, and viewing video augmented with existing camera systems is presented. A smartphone implementation is termed “Looking Glass”.