Field of the Invention
The invention is directed to methods and systems for determining the pose of a camera with respect to at least one object of a real environment for use in an authoring, e.g. for geospatial databases, or augmented reality application, wherein at least one or two images are generated by the camera capturing a real object of a real environment. According to the determined pose of the camera, the image or images may by augmented with virtual objects according to the authoring or augmented reality technology.
Description of the Related Art
Applications are known which augment an image or images generated by at least on camera with virtual objects using the so-called Augmented Reality (AR) technology. In such application, a camera coupled to a processing unit such as a microprocessor takes a picture of a real environment, wherein the real environment is displayed on a display screen and virtual objects may be displayed in addition to the real environment, so that the real environment displayed on the display screen is augmented with virtual objects of any kind on a display screen. In such application, in order to augment the image with virtual objects, there is the need for the microprocessor to determine the position and orientation (so-called pose) of the camera with respect to at least one object of the real environment in order for the microprocessor to correctly augment the captured image with any virtual objects. In this context, correctly augmenting the captured image with any virtual objects means that the virtual objects are displayed in a manner that the virtual objects fit in a perspectively and dimensionally correct fashion into the scene of the image.
A known method for determining the pose of the camera uses a virtual reference model of a corresponding part of the real environment captured by the camera, wherein the virtual reference model is projected into the image, using an initially known approximation of the pose, and superimposed with the corresponding part of the real environment. A tracking algorithm of the image processing then uses the virtual reference model to determine the pose of the camera with respect to the real environment, for example by feature detection and comparison between the reference model and the corresponding part of the real environment.
Another known method for determining the pose of the camera uses a marker that is placed in the real environment and captured by the camera when taking the image. A tracking algorithm of the image processing then uses the marker to determine the pose of the camera with respect to the real environment, particularly by analysing of the marker in the image using known image processing methods.
A disadvantage of the above-mentioned methods is that either a virtual reference model has to be conceived first and stored, which is very time and resource consuming and almost impossible if the AR technology shall be capable of being used spontaneously in any real environment. With respect to using a marker, the user has to place in an initial step the marker in the real environment before taking the image, which is also time consuming and troublesome. Particularly, for these reasons these methods may hardly be used in connection with any consumer products, such as mobile phones having an integrated camera and display, or other mobile devices.
Moreover, from the prior art there are known so called structure from motion and simultaneous localization and tracking (SLAM) methods. All these methods serve for determining the position and orientation (pose) of a camera in relation to the real world or of part of the real world. If there is no pre-information available, in some cases it is not possible to determine the absolute pose of the camera in relation to the real world or part of the real world, but only the changes of the camera poses from a particular point of time. In the above-mentioned applications, SLAM methods may be used to get orientation from planar points, but a disadvantage is that one is not sure if ground plane or some other plane is identified. Further, with such methods one may only get an initial scale by translating the camera, e.g. along a distance of 10 cm, and communicating the covered distance to the system. Moreover, SLAM methods need at least two images (so-called frames) taken at different camera poses, and a calibrated camera.
Another known technology is disclosed in “Initialisation for Visual Tracking in Urban Environments”, Gerhard Reitmayr, Tom W. Drummond, Engineering Department Cambridge, University Cambridge, UK (Reitmayr, G. and Drummond, T. W. (2007) Initialisation for visual tracking in urban environments In: 6th IEEE and ACM International Symposium on Mixed and Augmented Reality (ISMAR 2007), 13-16 Nov. 2007, Nara, Japan). The model-based tracking system is integrated with a sensor pack measuring 3D rotation rates, 3D acceleration and 3D magnetic field strength to be more robust against fast motions and to have an absolute orientation reference through gravity and the magnetic field sensor. Sensor fusion is implemented with a standard extended Kalman filter using a constant velocity model for the camera pose dynamics. Different inputs such as a camera pose from the tracking system or measurements from the sensor pack are incorporated using individual measurement functions in a SCAAT-style approach (Greg Welch and Gary Bishop. Scaat: incremental tracking with incomplete information. In Proc. SIGGRAPH '97, pages 333-344, New York, N.Y., USA, 1997. ACM Press/Addison-Wesley Publishing Co.).
Another technique is disclosed in “Robust Model-based Tracking for Outdoor Augmented Reality”, Gerhard Reitmayr, Tom W. Drummond (Gerhard Reitmayr and Tom Drummond, Going Out: Robust Model-based Tracking for Outdoor Augmented Reality Proc. IEEE ISMAR′06, 2006, Santa Barbara, Calif., USA.) The tracking system relies on a 3D model of the scene to be tracked. In former systems the 3D model describes salient edges and occluding faces. Using a prior estimate of camera pose, this 3D model is projected into the camera's view for every frame, computing the visible parts of edges.
Another application known as “mydeco” available on the Internet exists in which an image showing a real environment may be augmented with virtual objects. However, this system needs to set the rotation of the ground plane, which is quite cumbersome to the user.
In “A Lightweight Approach for Augmented Reality on Camera Phones using 2D Images to Simulate 3D”, Petri Honkamaa, Jani Jaeppinen, Charles Woodward, ACM International Conference Proceeding Series; Vol. 284, Proceedings of the 6th international conference on Mobile and ubiquitous multimedia, Oulu, Finland, Pages 155-159, Year of Publication: 2007, ISBN:978-1-59593-916-6 there is described that using manual interaction for the initialization purpose, particularly by means of a reference model and the user's manipulation thereof, is an appropriate way as the tracking initialization is an easy task for the user, but automating it would require pre-knowledge of the environment, quite much processing power and/or additional sensors. Furthermore, this kind of an interactive solution is independent of the environment, it can be applied “anytime, anywhere”.
In U.S. Pat. No. 7,002,551 there is disclosed a method and system for providing an optical see-through Augmented Reality modified-scale display. It includes a sensor suite that includes a compass, an inertial measuring unit, and a video camera for precise measurement of a user's current orientation and angular rotation rate. A sensor fusion module may be included to produce a unified estimate of the user's angular rotation rate and current orientation to be provided to an orientation and rate estimate module. The orientation and rate estimate module operates in a static or dynamic (prediction) mode. A render module receives an orientation; and the render module uses the orientation, a position from a position measuring system, and data from a database to render graphic images of an object in their correct orientations and positions in an optical display. The position measuring system is effective for position estimation for producing the computer generated image of the object to combine with the real scene, and is connected with the render module. An example of the position measuring system is a differential GPS. Since the user is viewing targets that are a significant distance away (as through binoculars), the registration error caused by position errors in the position measuring system is minimized.
Therefore, it would be beneficial to provide a method and a system for determining the pose of a camera with respect to at least one object of a real environment for use in an authoring or augmented reality application which may be performed with reduced processing requirements and/or at a higher processing speed and, more particularly, to provide methods of authoring 3D objects without knowing much about the environment in advance and, where necessary, being able to integrate user-interaction to serve the pose estimation.