It has long been a goal to be able to extract 3D information in the form of a 3D model of an object from a 2D image as a 3D model provides an entity with greatly enhanced capabilities for manipulation when compared with a point based model. In addition, 3D models of an object based on geometric primitives such as cubes, spheres and even conformable volumes made up of spline surfaces may be parameterized efficiently, thereby reducing the data storage required to store these objects. Furthermore, 3D models based on geometric primitives also allow these models and hence the objects they represent to be inserted into, and removed from image sets and artificial environments such as video games or simulation systems.
In the case of a sequence of images such as those provided by a moving camera viewing a scene, structure from motion (SFM) techniques have been developed which attempt to automatically locate corresponding points on a sequence of images. A corresponding point on a sequence of images is the set of points that correspond to the same physical location within each image.
These points are obtained by processing a first image to locate a point such as the corner of a building and then processing subsequent images to locate that same point. From this process, 3D information may be generated in the form of a point cloud that relates the 3D position of points visible within a sequence of images. This process also provides relative camera location and orientation. Whilst this technique is able to provide some relevant 3D information, it fails to provide higher level structural information in relation to a scene other than a cloud of points located in a scene space.
Methods exist which aid the SFM process by allowing an operator to manually identify corresponding points in a sequence of images, thereby constraining the SFM analysis. Reconstructed points generated through the resulting SFM process can then be joined by lines, and the lines can be further used to define surfaces and so on to potentially generate a 3D model of an object with scene. However, as would be apparent to those skilled in the art, this process is extremely manually intensive and in many circumstances the point cloud will not encompass relevant objects of interest, as the initial image processing is unable to extract 3D point information for those objects.
In another example of a method that attempts to generate a 3D model from one or more 2D images that include associated 3D information such as a point cloud generated from SFM techniques, an operator manually positions a geometric primitive to visually line up with an object of interest. Once this geometrical primitive (e.g. a cube) has been manually aligned in one or more images, it is then taken to define a 3D model of that object. However, this technique suffers from the significant drawback that it relies on the operator to be able to manually define and align the appropriate geometric primitive and often this process must be carried out over multiple images in order to properly constrain the 3D model.
Accordingly, there is a need to provide an alternative method for generating 3D models from 2D images.