The task of extracting three-dimensional (i.e., spatial) information, such as location, orientation and dimensions, of objects from two-dimensional (i.e., flat) photographs has been studied for many years. The general field can be divided into two major areas: camera reconstruction, which relates to the filed of photogrammetry, and three-dimensional modeling, which is now a subset of the larger field of computer graphics, for example, as applied in computer-aided design of architecture, industrial design and construction. Each of these fields has developed certain techniques of interest to the present invention.
For example, in the science of photogrammetry, algorithms have been developed to extract information about the camera with which a picture has been taken. This includes so-called internal parameters, such as the focal length and distortions of the camera lens(es) and data regarding the imaging plane of the camera, as well as external parameters, such as the location and orientation of the camera. Generally, these techniques have been based on two or more images with a set of known points correlating in each image.
The aim of photogrammetric methods and systems is to provide precise measurements of real world objects. To this end, stereoscopic cameras and reconstruction workstations have been developed. They often require specific (i.e., calibrated) cameras with known focal lengths, optical and spatial separations, projections and other characteristics to allow accurate reconstruction on the camera parameters. In addition, some conventional photogrammetric techniques require that one or more points in the "scene" shown in the photographs have locations or inter-point distances which are known in advance. This can be very difficult, as it may require access to the exterior of buildings, landmarks or other structures which may be impossible. Perhaps most importantly, in terms of the drawbacks of such techniques as viewed from the focus of the present invention, photogrammetric schemes of the past typically do not provide outputs in the form of complete three dimensional models and rarely, if ever, provide any texturing information for objects in a scene.
Three-dimensional modeling applications, on the other hand, have as a primary objective the production of such models. The need for such techniques is felt in a number of ways. For example, it is often desirable to have a three-dimensional model, complete with a description of shape, location, orientation and material surface properties (i.e., texture), in order to produce realistic renderings on a computer which can be used to document a new design of a city, a building or an object. The model can also be used for computer animations, virtual reality immersion of users in a scene or for manufacturing tasks.
However, constructing digital three-dimensional models is not a trivial task. In general, it requires considerable training and skill and there tend to be far fewer individuals who are capable of producing such models than are capable, say, of producing text documents or spreadsheets (or other computer generated outputs for that matter). Computer-assisted three-dimensional modeling techniques of the past tend to require users to construct the models from "scratch", starting with an empty scene and building up the model one object at a time. For each object to be placed in the scene, considerable data is required and such data must be accurately known in order to create a realistic and accurate model. For example, the shape, in terms of the form of points on the surface of the object to be modeled, the object's location, size and orientation and often its spatial relationship to other objects in the scene must all be known before a user begins to create the model. Further, when an object to be modeled does not yet exist in the real world, such as is the case with, say, a new building or interior design, the only way to create the scene is with a model. But, in many cases, other buildings, mechanisms or environments into which the new design will be introduced already exist and must be accounted for in the model. The previous modeling techniques thus require a great deal of measuring, data entry and checking before the model can be created.
In a few cases, some modeling applications allowed a finished model to be matched up with an underlaid photograph to give the impression that the new model is part of the photograph. However, the model itself still had to be constructed "in a vacuum", using the labor intensive techniques described above. Moreover, because the underlaid image or photograph was flat (i.e., two-dimensional), one could not create an animation or immersive virtual reality environment from it, as any changes in viewpoint would make the flatness of the underlaid image apparent. One example of such a product was Alias Upfront, which apparently was intended for architectural applications but which is believed to have been discontinued. In general, this software package allowed a user to first create a model of a scene using conventional computer-modeling techniques and then, as a post-process, allowed for positioning a primitive on top of a photograph underlaid in the model. Thus, a camera was roughly matched and the previously created three-dimensional model could be rendered superimposed on top of the photograph with correct perspective. No modeling was done from the photograph nor was the camera model very precise.
More recently, there have been attempts (both in the commercial and academic worlds) to help in the creation of three-dimensional models by using photographs of existing objects. In some cases, these techniques require the use of multiple photographs and/or explicitly marked points on the surfaces of the objects to be modeled. Such schemes thus require a user to first point out the significant (e.g., previously marked) points on the object's geometry. Next, for each such point, the corresponding point, if visible, in each other image must be marked. Following this inter-image correlation, edges between the points must be marked and, then, faces defined by the edges marked, e.g., by indicating loops of the edges. All of these matching and marking steps can be very labor intensive, akin to modeling objects from scratch.
In addition, previous approaches to the modeling problem also often involve algorithms that run in "batch" mode. That is, a user must create all of the input data (e.g., vertices, edges, associations, etc.) and then invoke the modeling method. The modeling algorithms then complete all of the required calculations before providing any feedback to the user. Sometimes, because of inconsistent or undetermined input information or due to singularities in the modeling algorithms themselves, these batch processes cannot return correct or even useful models. Even worse, such algorithms often provide little or no indication of what the cause of the problem was or where the user might correct the input information to resubmit to the batch process.
One recent software application, known as 3D Builder, available from 3D Construction Company, allows modeling of complex curved surfaces (e.g., human faces) but requires that users mark many points on the real world object to be modeled before taking a photograph thereof. During the modeling, a point is created in the model for each marked point on each photograph and the corresponding points in the different photographs must be associated with one another in the model. Once the points have been correlated, edges can be constructed between the points and faces created between the edges. This is a very labor intensive process and requires the use of a number of "atomic" entries (the points) to achieve high accuracy. Only after all of the points, edges and faces have been created does the modeling process run in a batch mode (as opposed to a user interactive mode) to (hopefully) generate the resulting three-dimensional model.
A recently published method (Paul Debevec et al., "Modeling and Rendering Architecture from Photographs: A Hybrid Geometry- and Image-Based Approach", University of California Berkeley Technical report UCB-CSD-96-893, January 1996) somewhat simplifies this situation by not having to deal with geometry at a vertex, then edge, then face level, but rather with primitives such as boxes or cylinders. The method requires a user to first create a parameterized (or rough) model of the objects in the scene using a separate editor. Second, the user draws edges on top of one or more photographs. Third, the user marks each edge in each photograph as corresponding to a particular edge in the parameterized model. The method then calculates values for the parameters in the model. This work is based on concepts and mathematics from Camillo Taylor and David Kriegman of Yale University, as reported in "Structure and Motion from Line Segments in Multiple Images", Yale University, Technical Report #94026, January 1994. Although somewhat less labor intensive than previous techniques, the Debevec method (known as Facade) still requires three, individually intensive, steps and the user must be skilled enough to build a parameterized model independent of the photographs.
Other reported methods, e.g., Michael Kass "CONDOR: Constraint-Based Dataflow", SIGGRAPH'92, pp. 321-330 (Jul. 26-31, 1992) and Michael Gleicher and Andrew Witkin, "Through-the-Lens Camera Control", SIGGRAPH '92, pp. 331-340 (Jul. 26-31, 1992), use data structures known as a dataflow network to create a required Jacobian matrix for providing iterative solutions to the modeling problem. For example, Gleicher and Witkin show how to apply traditional keyframing techniques to existing three-dimensional models and how to then solve for camera positions. However, in this technique, no modeling is done on top of an image nor is any texture extraction provided.
In light of the need for computer-generated three-dimensional models, but given the shortcoming of these and other prior schemes, it would be desirable to have an improved computer-assisted technique for constructing a three-dimensional model on top of one or more images such that the model's parameters automatically match those of the real world object depicted in the photograph(s).