The present disclosure relates to a method and a device for positioning an acquired image, in particular a street-level image, using a textured three-dimensional (3D) model.
In the prior art, geographical information systems providing digital maps are well known. Typically, 3D maps are formed on the basis of images captured (or acquired) by an airplane scanning a section of terrain to be modeled in 3D. When capturing the images, camera position can be obtained by GPS, and the images further time stamped. The airplane is further equipped with an inertial measurement unit (IMU) such that the angles of rotation of the airplane, known as roll, pitch and yaw, can be recorded. Thus, both the position and attitude of the camera are recorded for each captured image.
Stereo processing of captured images can be performed, taking into account the position and attitude of the camera (referred to as navigation data). An image pair comprising overlapping image data captured at substantially the same point in time is related to the navigation data, whereby each respective pixel of the overlapping images can be associated with a geographical coordinate on the ground. Stereo processing implies that only those parts of the images are used which match with a corresponding area in the other image of the image pair. By subsequently using trigonometric functions, the distance from the camera plane to a given point on the ground can be calculated and a 3D map representation can be created.
Aerial images can be positioned with high accuracy, due to good GPS signal reception conditions in the airspace as well as the post processing possibilities gained with IMU equipment. 3D representations from such images result in high-accuracy geo-referenced 3D models with detailed high resolution textures. However, in order to accomplish 3D models having an even more realistic appearance, aerial imagery can be supplemented with street-level images or, similarly, with images captured at intermediate altitude. This can be accomplished by texturing the 3D model with the street-level imagery as well as by using the street-level imagery in the 3D reconstruction process. In order for either of these things to be possible, the street-level imagery must be positioned with high accuracy relative to the 3D model and the underlying 3D model must have enough detail in texture and geometry for the street-level imagery to have sufficient correspondence with the 3D model. Sufficient geo-referenced detail in the underlying 3D model is difficult to obtain with a box-like building representation, as the geometry and texture of these models seldom represent the real world accurately enough. However, with high-detail aerial 3D models, positioning of street-level imagery with sufficient accuracy is possible. With an accurate street-level pose, merging of street-level imagery as well as 3D reconstruction of even more complex surfaces such as curved surfaces, balconies, decorations or elaborated window frames is possible. Thus, the authentic appearance of the aerial 3D representation is enhanced by adding details from the street-level images.
One prior approach to accomplishing this relies on a hybrid modeling system that fuses Light Detection And Ranging (LiDAR) data, aerial images, and ground-view images for creation of accurate 3D building models. Outlines for complex building shapes are interactively extracted from a high-resolution aerial image. Surface information is automatically fitted using a primitive based method from LiDAR data, and high-resolution ground view images are integrated into the model to generate fully textured CAD models.
While 3D modeling using aerial images generally results in high-quality positioning, street-level 3D modeling typically suffers from lower-quality positioning. Factors such as, for instance, GPS signal shadows due to obstacles, signal distortions and the drifting of IMU data in the relatively varied motion of street-level vehicles deteriorate measurements on ground level. This causes the recorded position of street-level images to be inaccurate. Further, mechanical and optical properties of a given real camera differ from those of an assumedly identical camera, resulting in incorrect measurements. Yet a further problem is that alignment of images captured at angels differing greatly is troublesome since it will be difficult to find overlapping image data. Thus, when projecting street-level images onto a 3D model derived from aerial images, there is a significant risk of mismatch since the pose of the ground-level camera used for capturing the street-level images does not comply with the geographic referenced details of the aerial 3D model.