1. Technical Field
The invention is related to a computer-implemented object modeling system and process, and more particularly to such a system and process that employs noise elimination and robust surface extraction techniques.
2. Background Art
Computer modeling of natural or manmade objects from images has found a wide range of applications. For example, an electronic catalogue of products for Internet advertisement, or the visualization of museum objects on the Internet. The object could even be a human face, and the applications include talking heads and games, and very low bandwidth video conferencing on Internet.
There are several considerations in deciding what system to employ to accomplish the aforementioned object modeling. First, there are the image collection costs. Many object modeling applications do not lend themselves to elaborated camera setups or customized fixtures for holding the object, as is needed with many of today""s object modeling systems for capturing the needed images of the object being modeled. The cost and difficulty in positioning the camera and/or the object are simply to great in many cases to make the modeling practical. It would be much more desirable that so-called casual images be used and that the object modeling system be capable of using such images. Casual images of an object are those that can be captured without having to control the camera""s position or orientation, and which do not require costly object-holding fixtures. One particularly attractive way of capturing casual images of an object is with a desktop digital camera. Desktop-type digital cameras are becoming cheap and ubiquitous, and so would be an excellent choice for capturing the images needed to model an object at minimum cost.
Another consideration in selecting an object modeling system is the data storage requirements of the system. If the storage requirements are extensive, the cost of the system""s memory could become prohibitive. In addition, if large amounts of data must be handled, the object modeling will not be suitable for remote presentation, such as on the web.
Lastly, the complexity of the object modeling system should be considered. For example, if the processing requirements of the system are overly complex, then the time required to model an object may become excessive. In addition, special computing hardware may be needed, thereby driving up the cost of the system. Further, if the system requires a great deal of user interaction in the modeling process, it may be too daunting for many too learn and use the system.
The current object modeling techniques can be roughly divided into two categories. The first category uses a 3D model-based representation. In this category, a CAD-like model is built, which is very concise. However, it is extremely difficult to obtain an accurate model because of imperfections in system calibration, uncertainty in feature extraction, and errors in image matching. In addition, these techniques often need to acquire and store a large number of images, and the collection cost is high because the camera""s pose must be memorized for each image taken. The other category involves an image-based representations. In this category, one needs to acquire and store nearly all images necessary for subsequent visualization. Therefore, visualization of an object model is essentially a redisplay of images, yielding photorealistic results. However, a major drawback of this technique is that it requires all images be stored for future rendering, thus requiring a great deal of system memory.
Given the foregoing considerations, it is clear there is a need for an improved object modeling system and process.
It is noted that in the remainder of this specification, the description refers to various individual publications identified by a numeric designator contained within a pair of brackets. For example, such a reference may be identified by reciting, xe2x80x9creference [1]xe2x80x9d or simply xe2x80x9c[1]xe2x80x9d. A listing of the publications corresponding to each designator can be found at the end of the Detailed Description section.
The present invention involves a new object modeling system and process that allows easy image capture, requires only a small amount of data storage, and constructs the desired model efficiently. The system and process first acquires images of an object that is to be modeled. These images can be captured in a variety of ways, but regardless of the procedure employed, the images should collectively depict every surface of the object that it is desired to model. One possible procedure for capturing the needed images of the object would be to use a stereo camera rig and obtain stereo images of the object (e.g., using a trinocular stereo system). Another, more standard, method would be to capture a series of 2D images of the object being modeled. One particularly attractive procedure for obtaining the 2D images is to capture the aforementioned casual images of the object. As explained previously, casual images are those that can be captured without having to control the camera""s position or orientation, and which do not require costly camera equipment or object-holding fixtures.
The images of the object, regardless of how they were captured, are employed in the next phase of the object modeling process to compute a series of 3D reconstructions of the object. The reason for initially computing multiple 3D reconstructions from the images is to ensure every surface of the object being modeled is represented by some part of a reconstruction and to allow for significant overlap in the reconstructions to increase the accuracy of the modeling process. It is also preferred that a multiframe stereo reconstruction process be employed to increase the accuracy of the reconstructions. To this end, a small group of consecutive images (e.g., 5 frames) could be used to produce each reconstruction. There are many existing methods that can be used to compute the 3D reconstructions, any of which could be employed as desired. For example, if the acquired images are individual 2D views of the object, a standard feature point tracking and structure-from-motion approach could be used to produce the desired 3D reconstructions. It is noted that these standard reconstruction approaches using 2D images typically require that the intrinsic parameters associated with the camera used to capture the images be computed. Alternately, if the acquired images are captured using a stereo rig, such as for example a trinocular stereo system, the 3D reconstructions could be generated via a conventional 3D registration procedure.
Regardless of how the 3D reconstructions are obtained, it is preferred that the reconstruction data be processed to eliminate noise effects before proceeding to the next phase of the object modeling process. This is especially desirable where stereo matching techniques have been used to generate the reconstructions as they tend to be noisy. One preferred noise elimination procedure is xe2x80x9cautomatic clusteringxe2x80x9d. This method begins by calculating the mean and then the variance of the points in the point cloud in each orthogonal direction. A threshold is then applied in each direction to define a cuboid-shaped bounding box. Any point lying outside the bounding box is eliminated.
The procedure is repeated until there is no longer any significant change in the mean and variance between iterations. While the automatic clustering method is good at eliminating extraneous reconstruction points outside the defined bounding boxes, it does nothing to eliminate extraneous points within the box, such as might exist in voids or holes associated with the object. A xe2x80x9c3D spatial filteringxe2x80x9d procedure can be used to remove such points from the reconstruction data. The 3D spatial filtering begins by dividing a 3D space containing all the reconstruction points into voxels. To minimize processing, an octree scheme [3] is employed resulting in only voxels containing reconstruction points being considered. For each point, the voxel containing the point is identified along with a prescribed number of its neighboring voxels. All the points contained in the voxel block are counted, and if the total number of points exceeds a prescribed threshold, then the point remains in the reconstruction data. Otherwise, the point is eliminated.
Once the noise elimination processing is complete, the various individual 3D reconstructions are merged into one overall 3D reconstruction of the object via a standard registration process. This registration is required because the images used to compute the individual 3D reconstructions would have been captured at a different orientations in relation to the object. Thus, the coordinate frames of each group may be different, and so to create an overall 3D reconstruction, the point sets associated with each of the individual reconstructions have to be aligned to a common coordinate frame.
The next phase of the object modeling process involves a surface extraction procedure designed to define the surface of the object based on the points associated with the previously-computed overall 3D reconstruction. One preferred procedure for accomplishing this task begins by dividing a 3D space containing all the reconstruction points associated with the overall 3D reconstruction into voxels using an octree approach so that only those voxels containing at least on reconstruction point are identified. Each voxel in turn undergoes a xe2x80x9csigned distance computationxe2x80x9d to define a plane which best represents the surface of the object in that voxel. Specifically, the signed distance computation begins by identifying a xe2x80x9cneighborhoodxe2x80x9d of voxels associated with the voxel under consideration. For the purposes of the present object modeling process, a fixed neighborhood size was usedxe2x80x94for example, a 3 by 3 by 3 voxel block as used in tested embodiments of the present invention. All the points contained within the identified voxels neighborhood are used to calculate a plane that represents the surface of the object contained within the voxel under consideration. This procedure is then repeated for all the voxel containing reconstruction points.
Preferably, the plane for each voxel is defined by a normal thereof extending from the plane to a prescribed one of the vertices of the voxel under consideration. This normal is preferably established by first computing the centroid of all points in the previously identified voxel neighborhood. A covariance matrix is then computed and the eigenvector corresponding to the smallest eigenvalue of the covariance matrix is designated as vector of the normal of the plane, but without initially specifying which of the two possible directions that the vector is directed. The distance from the plane to the prescribed vertex along the normal is also calculated to establish the magnitude of the normal vector. The direction of the normal for each voxel is preferably established by first identifying the direction from each point in the voxel under consideration to the optical center associated with the camera used to capture the original image from which the point was derived. This is repeated for each of the other voxels in the previously identified voxel neighborhood associated with the voxel under consideration. The vector from a reconstruction point to its associated optical center is referred to as the visibility vector. It is next determined whether the angle between the normal computed for each voxel in a voxel neighborhood and the visibility vector for each point contained in that voxel is less than 90 degrees by a prescribed threshold amount, greater than 90 degrees by the prescribed threshold amount, or within the prescribed threshold amount of 90 degrees. The normal vector of a voxel under consideration would be assigned a positive direction (i.e., toward the prescribed vertex) if a majority of the angles between the visibility vectors and associated normals were less than 90 degrees by the threshold amount, and assigned a negative direction (i.e., away from the prescribed vertex) when the majority of the angles are greater than 90 degrees by the threshold amount. However, there can be regions of an object""s surface where the plane normal of a voxel is almost perpendicular to most of the visibility vectors associated with the points contained in the voxel. If this occurs there is some ambiguity as to whether the normal points in the positive or negative direction. Thus, if the majority of the of the angles between the visibility vectors and associated normals were found to be within the prescribed threshold amount of 90 degrees, then the undetermined direction status is maintained for the normal of the voxel under consideration.
Once the foregoing procedure has been performed for all the voxels containing reconstruction points, the direction of the normal of any voxels still having an undetermined normal direction status is preferably determined by a back propagation procedure. The back propagation procedure begins by selecting a voxel that was marked as having a plane normal with an undetermined direction, and identifying which of the directly adjacent neighboring voxels has the largest absolute value for the cross product of the normal vector associated with the currently selected voxel and the normal vector associated with the neighboring voxel. If the so identified neighboring voxel has a previously determined direction for its plane normal, then the same direction is assigned to the plane normal of the xe2x80x9cundeterminedxe2x80x9d voxel. However, if the identified neighboring voxel also has a plane normal with an undetermined direction, then the identified neighboring voxel becomes the currently selected voxel and the process is repeated until a voxel with a determined normal direction is encountered or a prescribed propagation limit is reached. In the case where a voxel having a determined normal direction is reached within the propagation limits, the direction associated with that voxel""s normal is assigned to all the xe2x80x9cundeterminedxe2x80x9d voxels traversed on the way to the xe2x80x9cdeterminedxe2x80x9d voxel. If the prescribed propagation limit is reached before encountering a voxel having a determined direction, then the undetermined normal direction status of the currently selected voxel, and that of any xe2x80x9cundeterminedxe2x80x9d voxels that were traversed on the way to the currently selected voxel, are retained.
Once the back propagation procedure is complete, it is preferred that an additional check be made to ensure the derived normal directions are realistic and to establish a normal direction for any remaining xe2x80x9cundeterminedxe2x80x9d voxels. This entails imposing a local consistency check where, for each voxel, the signs of the normals associated with the voxels in a prescribed voxel neighborhood associated with the voxel are identified, and if the voxel under consideration has a sign that is inconsistent with the majority of the neighborhood, or has an undetermined sign, it is changed to match the majority.
A modified marching cubes approach, which employs the just-computed normal vector data, is then used to extract a triangle-mesh representation from the implicit surface of the object defined by the planes computed for each voxel. The standard marching cubes procedure is used, however, it is modified to incorporate an octree-based approach. In the traditional marching cubes method, a voxel is selected (i.e., the seed cube) that contains a portion of the surface of the object. The triangle mesh representation of the surface is then constructed by, inter alia, identifying a neighboring voxel containing a part of the surface and xe2x80x9cmarchingxe2x80x9d voxel by voxel until surface is analyzed. However, if the object being modeled is made up of multiple, separated sections, then the traditional method will miss disjointed portions of the object not containing the surface component of the seed cube. In the modified procedure, this problem is resolved as follows. Any one of the previously defined voxels containing reconstruction points is selected. The triangle-mesh surface representation is then computed by proceeding as in the traditional marching cubes method, with the exception the an accounting of each voxel processed is kept. When all the voxels containing a section of the surface of the portion of the object associated with the initially selected voxel have been processed, it is determined if any unprocessed voxels exist (as would be the case if the object is made up of disjointed surfaces). Whenever, it is determined that un-processed voxels exist, the foregoing procedure is repeated until no more un-processed voxels remain. In this way every voxel containing reconstruction points is processed and so every surface making up the object will be modeled.
Finally, once the surface extraction procedure is complete, it is possible to also perform a texture mapping process to create a photorealistic model of the image. One preferred way of accomplishing this texture mapping is as follows. For each of the triangular areas of the object""s surface defined in the surface extraction procedure, the portions of the original images that depict that particular triangular area are identified. These areas are then blended to create a composited representation of the area. Finally, this composited representation is assigned as the texture for the selected area.
In addition to the just described benefits, other advantages of the present invention will become apparent from the detailed description which follows hereinafter when taken in conjunction with the drawing figures which accompany it.
The file of this patent contains at least one drawing executed in color. Copies of this patent with color drawing(s) will be provided by the U.S. Patent and Trademark Office upon request and payment of the necessary fee.