The teachings of this invention relate generally to computer vision and computer graphics and, more specifically, the teachings of this invention relate to techniques for acquiring silhouettes from an image.
A number of different techniques have been developed to compute shapes from silhouettes or contours in the field of computer imaging.
The teachings herein address the problem of acquiring a numerical description of the shape of an object. Given a numerical description of the object""s shape it is possible, using well-known computer graphics algorithms, to generate images of the object from different points of view and under different lighting conditions. One important application of such synthetic imagery is in e-commerce, where the seller of an object allows potential customers to inspect a virtual copy of an object interactively using a computer. Numerical representations of objects can be used for other purposes. such as in CAD (computer-aided design) systems as a starting point for the design of new objects.
A class of popular methods for acquiring a numerical representation of an object""s shape is known as shape from silhouette, also referred to by similar names such as shape from occluding contour or shape from boundaries. Shape from silhouette algorithms use an image of an object captured by a camera, or any other imaging device. Using the known position of the camera, and the silhouette of the object in the image (i.e. the curve that marks the boundary in the image between the object and the background), an estimate of the numerical shape can be made. A very crude estimate of shape can be obtained from a single image. An improved estimate is obtained using a number of silhouettes from images of the object in different positions relative to the camera.
Many algorithms have been devised to compute a numerical description of the three dimensional shape of an object from silhouettes. One class of algorithms is known as volumetric or space carving, as originally described by Martin and Aggrawal (Worthy N. Martin and J. K Agrawal, xe2x80x9cVolumetric Descriptions of Objects from Multiple Viewsxe2x80x9d, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. PAMI-5, No. 2, March 1983, pp. 150-158.) In this technique a volume of small boxes is numerically defined that completely encloses the object. For each image the boxes are projected onto an image plane. If the projection of a box falls outside of the object silhouette, it is marked as xe2x80x9coutsidexe2x80x9d and is eliminated from a current estimate of the object shape. As each silhouette image is considered more of the boxes are eliminated, or xe2x80x9ccarved awayxe2x80x9d from the initial volume. The boxes remaining after all of the silhouette images have been examined is the estimate of the object""s shape. A smooth representation of the surface of the object can then be obtained by any well-known isosurface algorithm.
An alternative class of algorithms for extracting shape from silhouettes uses the variation of contour shape in successive images. An example is described by Zheng (Jiang Yu Zheng, xe2x80x9cAcquiring 3-D Models from Sequences of Contoursxe2x80x9d, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 16, No. 2, February 1994, pp. 163-178.) In this method, many silhouette images are obtained as the object is rotated in front of the camera. An estimate of 3D location of points on the object""s surface is obtained from the location of silhouettes in the image relative to the projection of the axis of rotation, and the rate of change of these positions with respect to angular change.
There are fundamental limitations on the accuracy of the shape that can be recovered by shape from silhouettes, as discussed by Laurentini (Aldo Laurentini, xe2x80x9cHow Far 3D Shapes Can Be Understood from 2D Silhouettesxe2x80x9d, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 17, No. 2, February 1995, pp. 188-195.). For example, object concavities will not appear in silhouettes, and so will not be captured. To provide the illusion of concavities, and to add color to the model, capture systems generally acquire color images of the object from known camera positions. These color images can be related to the captured geometry by the well-known computer graphics technique known as projective texture mapping. Geometries (generally in the form of triangular meshes) with texture maps can be displayed with hardware and software available on typical personal computers.
A basic operation required by either class of the shape from silhouette algorithms is the accurate extraction of the boundary between the object and the background. This is an example of the classic image segmentation problem from the field of image processing. Systems for extracting shape attempt to simplify the segmentation by designing a suitable backdrop. An example of such a design is illustrated in Jones and Oakely (M. Jones and J. P. Oakley, xe2x80x9cEfficient representation of object shape for silhouette intersectionxe2x80x9d, IEEE Proc.-Vis. Image Signal Process, Vol. 142, No. 6, December 1995, pp. 359-364.) The backdrop for the object is painted a uniform color (in the case of Jones and Oakely xe2x80x9cChromakey Bluexe2x80x9d). The silhouette is defined as the boundary of the image regions that are the uniform background color.
An alternative approach uses a large flat diffuse light source in place of the colored backdrop. The silhouette is defined as the boundary of the bright image regions, with the object itself generally appearing dark.
Shape from silhouettes, particularly with the addition of color textures, is a popular technique because it can be implemented inexpensively. The major cost of the system resides in the camera and in a mechanism to control the position of the object, such as a turntable. The implementation with volume carving is particularly attractive for applications because the method guarantees a closed surface.
An alternative and related method for capturing object shape is xe2x80x9cshape from shadowsxe2x80x9d, as described in U.S. Pat. Nos.: 4,792,696 and 4,873,651. These methods are similar to shape from silhouettes, since a sharp shadow is the silhouette projected from a point light source. In both of these patents the camera is placed on the same side of the object as the direction of light incident on the object, and images are taken of the shadows cast by the object. In both of these patents it is assumed that the surface is a height field. That is, the object sits on a reference plane with locations on the plane specified by (x,y) Cartesian coordinates. The shape of the object is given by a third coordinate z that is descriptive of the height of the object surface above the reference plane. With this assumption, the shape of the object surface is inferred from where shadows begin and end, and from knowledge of the light source direction.
U.S. Pat. No.: 4,604,807 employs a shadow that is observed using a camera on the opposite side of the object from the light source. In this patent the shadow is formed by pressing a relatively flat object, e.g., a person""s foot, onto a translucent panel. The shadow is observed from the opposite side to obtain a numerical description of the two dimensional area of the foot, and is not used to estimate the three dimensional shape of the foot.
In an article by Leibe et al. (B. Leibe, T. Starner, W. Ribarsky, Z. Wartell, D. Krum, J. Weeks, B. Singletary and L. Godges, xe2x80x9cToward Spontaneous Interaction with the Perceptive Workbenchxe2x80x9d, IEEE Computer Graphics and Applications, November/December 2000, pp. 54-65.) a system is described that observes shadows cast by objects on a translucent table with a camera located underneath the table. The system can produce only a crude estimate of shape, because the object cannot be repositioned in a calibrated manner.
All of the prior art techniques known to the inventors assume that an accurate silhouette can be extracted from the image. However, if an accurate silhouette cannot be extracted, then the shape of the object will be inaccurate.
The segmentation approach fails if the object is shiny, transparent, or is same color as the background. Segmentation can also fail even with the use of a large diffused light source.
A number of other problems are encountered with the prior art techniques for finding object silhouettes. First consider the approach of using a background of known color. The silhouette is detected where the backdrop color ends in the image. This method fails for glossy objects that reflect some of the background color in the direction of the camera, and for objects which transmit light. This method also fails when camera characteristics cause xe2x80x9cbleedingxe2x80x9d of color from one region of the image to another. The method can also fail if inter-reflections on the object cast color from the background onto the object. The method also fails if the object happens to be the same color as the backdrop.
Some methods attempt to avoid these problems by taking an image of the backdrop alone and then an image of the object in front of the background, and then taking the difference between the two images. However, this approach fails for very shiny objects. It also fails when any shadow is cast by the object onto the backdrop.
The approach of using a large diffuse light source seeks to avoid the problem of the object possibly being the same color as the background. However, this technique also fails for shiny surfaces, light transmitting surfaces, and for surfaces in which self-interreflections transmit light from the backdrop onto the object. This approach also prevents the simultaneous acquisition of color images to be used as texture maps, since the bright background causes most of the object to appear very dark in the image. Having to acquire the color images separately extends the length of time required to obtain the numerical description of the object.
Both of the backdrop approaches allow only one silhouette to be obtained for each position of the object. For simple systems employing a device with one degree of freedom to provide accurate positioning, such as a turntable, one position of the object on the turntable may not be adequate to obtain a view of the entire object surface. The object is placed once, a series of images is obtained for one rotation of the device. The object is placed in a different position relative to the turntable, and another series is obtained. This process may need to be repeated many times, and the geometries recovered by each rotation must be registered to one another by an additional geometric processing step.
The methods that employ shadows have been in part motivated by the problem of segmentation from the backdrop when shiny objects are being scanned. However, for the shadow methods, with the camera in the same direction as the direction of incident light, the problem remains of separating the image of the object and the image of its shadow. Such segmentation is difficult for objects with a dark or partially dark surface, and is impossible for black objects. The shadow methods are also limited by the height field assumption for 3-D shape recovery. Objects with even moderately complex topologies, e.g., a coffee mug with a handle, cannot be measured with such techniques without substantial error.
The method described in U.S. Pat. No.: 4,604,807 employs optics and geometry that require that the object being measured rest against the translucent panel, and that the object shape is almost flat. The apparatus can only measure 2-D areas, and cannot be used to capture silhouettes of objects of arbitrary shape for 3-D shape recovery.
The system described by Leibe et al. requires the object to be scanned to. sit on a fixed translucent surface. Although the shape of some objects can be estimated from a sparse set of views spanning the full space of directions around the object, the system described by Leibe et al. is limited to shadows that can be cast from light sources above the translucent surface. The goal of the Leibe et al. system is to produce crude shape representations only, and the design does not permit the calibrated repositioning of an object, nor does it include a way to obtain additional information, such as shape from photometric data, to improve the estimate of shape and to include concavities. The system includes a side camera above the translucent surface, but obtaining silhouettes from this camera presents all of the problems of traditional silhouette extraction, and cannot, for example, be used for shiny objects.
It is a first object and advantage of this invention to provide an improved system and method to obtain 3-D shapes from one or more images.
It is a further object and advantage of this invention to provide a system and method for deriving the surface shape of an object from shadow images of the object obtained from behind a translucent panel that is interposed between an image capture device, referred to for convenience as a camera, and the object, where the object is interposed between the front of the translucent panel and one or more point light sources.
The foregoing and other problems are overcome and the foregoing objects and advantages are realized by methods and apparatus in accordance with embodiments of this invention.
Disclosed herein are embodiments of apparatus for obtaining the silhouette of an object in a form suitable for use by a shape from silhouette algorithm for obtaining a numerical description of the object""s three dimensional shape. Also disclosed are methods for processing the output of the apparatus into a numerical description of the object that is suitable for interactive display on a computer graphics system.
More particularly, disclosed herein are methods and apparatus for obtaining the shape of an object by observing silhouettes of the object. At least one light source, preferably a point light source, is placed in front of the object, thereby casting a shadow of the object on a translucent panel that is placed behind the object. An imaging device, referred to for convenience as a camera, captures an image of the shadow from behind the translucent panel. The silhouette or shadow contour is obtained from the image of the shadow as the region of the shadow that is substantially darker than the region outside of the shadow. This is true for any opaque object regardless of its surface finish or shape. By using a point light source, rather than a large diffuse light source, the quantity of light reflected by the object in the direction of the translucent panel is orders of magnitude smaller than light that impinges on the panel directly from the point source, thereby enhancing the contrast between the object""s shadow and the illumination from the light source. A further benefit obtained by the use of the point light source is that the object need not be in contact with the translucent panel to obtain a shadow having sharp edges. The full object silhouette is obtained since nothing (including the object itself) is in the path between the camera and the translucent panel. The full silhouette obtained can be processed by any suitable shape from silhouette algorithm, and thus the to be imaged are not limited in topological type. Unlike systems with large diffuse lights as backgrounds, which make the object appear black, a color image of the object can optionally be obtained simultaneously with the shadow image by using another camera, such as a color camera, that is placed on the same side of the object as the light source. Unlike conventional silhouette systems, multiple silhouettes can be captured for one object position, reducing the number of rotations needed on a turntable system, and reducing the post-processing needed to register geometries obtained from multiple different positions.
In accordance with the teachings herein, a system and method is disclosed for obtaining a three dimensional image of an object. The method includes the steps of (a) shining light from at least one light source on to the object from a first direction to create a first shadow cast by the object on a first surface of a translucent panel, where the object is disposed between a light source and the first surface of the translucent panel and has a first pose; (b) obtaining a first digital image of the first shadow from a second, opposite surface of the translucent panel; (c) changing the pose of the object and obtaining additional digital images of additional shadows cast by the object for different object poses; and (d) processing the first and the additional digital images to create a three dimensional image of the object. The step of processing preferably employs a space carving process. The step of processing operates to identify a boundary of the image of the shadow in each of the first and additional digital images, where the boundary is identified in a given one of the digital images by applying a pixel thresholding process to determine whether a given pixel is located within the image of the shadow or outside of the image of the shadow. The step of processing further defines a virtual volume as a list of volume elements, projects individual ones of the volume elements onto the plane of the image of the shadow, and retains only those volume elements in the list that lie within the image of the shadow or on the identified boundary. The step of processing then further applies an isosurface extraction algorithm to the list of surviving volume elements to obtain a numerical description of the shape of the surface of the object.
The step of shining light on to the object can also be done from a second, or third, or fourth, etc., direction to create an additional shadow or shadows cast by the object on the first surface of the translucent panel. The resulting shadow image(s) are processed in the same manner as the first shadow. A plurality of light sources each having a different color can be used, as can array of light sources that are operated in sequence. A single light source may be translated with respect to the object to shine light on the object from a plurality of different directions.
Further in accordance with these teachings the method may include additional steps of obtaining a digital image of the object for each object pose; processing the digital images of the object to derive surface normals and color maps; and applying the surface normals and color maps to the surface of the three dimensional image of the object.