1. Field of the Invention
The invention relates to image processing and more particularly to the construction of 3-D graphical models from 2-D images.
2. Background
Computer graphics has reached the stage where it is often difficult at first glance to tell computer generated scenes from real ones. While there are many unsolved details remaining, acceptable images of artificial virtual worlds can now be produced and it is mainly a question of improving the performance before a convincing real-time illusion can be achieved. Outside the realm of pure entertainment, the main inhibitor to making use of the technology in practical applications is that of creating the models to inhabit these virtual worlds. Large amounts of effort have to go in creating what appear to the viewer disappointingly simple models. The amount of effort to create realistic and complex models is very high indeed.
In entertainment, apart from effects such as metamorphosing real actors into unreal beings and the reverse, it is not usually necessary to model reality closely in the graphics. By contrast, in practical applications this is usually an essential part of the task. Most industrial design, for example, has to be fitted into a real environment, and the design can only accurately be judged in its natural context.
As an example, consider the possibility of modelling the inside of a large retail store. These are frequently changed around to keep them interesting and to optimize the sales. However, standard computer-aided-design (CAD) packages are not very satisfactory for the purpose. The essence of the problem can be stated thus: "How do you model a rack of coats on CAD system?"
In this example, the model to be handled is structurally complex but consists largely, if not entirely, of objects which already exist. In the case of a store design one might be dealing with large numbers of objects, such as clothes which are very hard to model.
Virtual reality methods are inhibited in many such practical applications by the absence of easy and reliable methods to capture the complex shapes, textures and colors from the real world and to incorporate these into the virtual world along with new or hypothesized objects.
Computer animations in the film world which imitate reality are created patiently with much measurement, calibration and accurate modelling. CAD techniques are used, but on specialist systems. This degree of effort cannot be justified for more everyday applications and hand-building such models is not an attractive option if it can be avoided.
Existing methods of automatic capture can be divided into two broad classes: active methods and passive methods. In active sensing, the system creates and controls a signal (such as a light beam), which is directed at the object and interprets the signal reflected. Passive sensing relies only on existing illumination.
Whilst active sensing methods give good results, the equipment required is generally specialized and expensive. Because the illumination has to be controlled, the environment is restricted as is the range of size of objects which can be captured. Also, the devices do not capture the appearance of the object at the same time as their shape. This has to be added as a separate step.
Passive methods only require images of the object and so do not suffer from these disadvantages. Only a camera and digitizer is necessary to capture the data and a special environment is not needed.
One such passive method is that of Volumetric Intersection. If a series of images of an object is taken against a contrasting background, these can be thresholded to identify the profile of the object in each image. Considering just one of these images, the object must lie somewhere within the volume of the polygonal pyramid which is formed from all the points in space which project within the profile. An approximation to the object volume can be built by intersecting the pyramidal volumes for all the images.
Although this method has been classified as a passive method, it does rely on being able to separate the background and the object in the images and so relies on highly controlled lighting. Further, because of the method of construction, it cannot reproduce objects with concavities.
Another approach is that of constructing depth maps by matching stereo pairs. The problem with this is that depth cannot reliably be determined solely by matching pairs of images as there are many potential matches for each pixel or edge element. Other information, such as support from neighbors and limits on the disparity gradient must be used to restrict the search. Even with these, the results are not very reliable and a significant proportion of the features are incorrectly matched.
Another problem with automated capture is that of finding a suitable way of storing and manipulating the 3D captured data. Conventional computer graphics relies on texture mapping to obtain a realistic amount of visual complexity. Geometric models would be too large and inefficient a method for describing a complete scene in detail. Geometry is usually used only for the basic shapes, images being mapped onto the surfaces to add the necessary small variations in color and orientation.
Almost universally in computer graphics, geometry is described by a collection of polygonal faces, often with colors associated with the vertices. While this is a convenient canonical form for computer generated graphics, when capturing data from the real world it is not ideal. Natural objects are not often simply described in polygonal terms. Moreover, man-made objects with regular shapes are usually decorated in complex ways to disguise their simplicity. In a manually entered design the basic shape and the superficial texture can be separated, but in automated capture this is more difficult to do. Even with objects generated by CAD systems, one can question the wisdom of reducing all the shape data to hundreds of thousands of polygons when the majority will occupy less than a pixel on the screen, even if they appear at all.