The object of the invention to generate stylized images from images acquired of real world scenes and objects. The stylized images can facilitate the viewer's comprehension of the shape of the objects depicted. Non-photorealistic rendering (NPR) techniques aim to outline shapes of objects, highlight moving parts to illustrate action, and reduce visual clutter such as shadows and texture details. Stylized images are useful for rendering low-contrast and geometrically complex scenes such as mechanical parts, plants or the internals of a patient undergoing examinations such as endoscopy.
When a rich 3D geometric model of the scene is available, rendering subsets of view-dependent contours of simple scenes is a relatively well-understood task, Saito et al., “Comprehensible Rendering of 3-D Shapes,” Proceedings of SIGGRAPH'90, 1990, and Markosian et al., “Real-Time Non-photorealistic Rendering,” Proceedings of SIGGRAPH'97, pp. 415-420, 1997. However, extending that approach to real world scenes, such as flowering plant, by first generating 3D models of the plant is difficult, if not almost impossible.
It is desired to bypass the acquisition of the 3D scene geometry. Instead, the object is to generate stylized images of real world scenes directly from images acquired by a camera.
In the prior art, the majority of the available techniques process a single image to generate a stylized image. Typically, morphological operations, such as image segmentation, edge detection and color assignment are applied. Some techniques aim for stylized depiction, see DeCarlo et al., “Stylization and Abstraction of Photographs,” Proc. Siggraph 02, ACM Press, 2002, and Hertzmann, “Painterly Rendering with Curved Brush Strokes of Multiple Sizes,” ACM SIGGRAPH, pp. 453-460, 1998, while other techniques enhance legibility.
Interactive techniques for stylized rendering such as rotoscoping are also known. However, it is desired to automate the process of generating stylized images, instead of requiring meticulous manual operations.
In aerial imagery, shadows in the scene are identified by thresholding a single intensity image, assuming a flat ground and an uniform albedo, to infer landscape and building heights, see Huertas et al., “Detecting buildings in aerial images,” Computer Vision, Graphics and Image Processing 41, 2, pp. 131-152, 1988, Irvin et al, “Methods for exploiting the relationship between buildings and their shadows in aerial imagery,” IEEE Transactions on Systems, Man and Cybernetics 19, 6, pp. 1564-1575, 1989, and Lin et al., “Building detection and description from a single intensity image,” Computer Vision and Image Understanding: CVIU 72, 2, pp. 101-121, 1998.
Some techniques improve shadow capture with shadow extraction techniques to determine shadow mattes, Chuang et al., “Shadow matting and compositing,” ACM Trans. Graph. 22, 3, pp. 494-500, 2003, or to remove shadows altogether to improve scene segmentation, Toyama et al., “Wallflower: Principles and Practice of Background Maintenance,” ICCV, pp. 255-261, 1999. Using intrinsic images, other techniques remove shadows without explicitly detecting the shadows, Weiss, “Deriving intrinsic images from image sequences,” Proceedings of ICCV, vol. 2, pp. 68-75, 2001.
Stereo techniques with passive and active illumination are generally designed to determine depth values or surface orientation, rather than to detect depth edges. Depth edges or discontinuities present difficulties for traditional stereo techniques due to partial-occlusions, which confuse the matching process, Geiger et al., “Occlusions and binocular stereo,” European Conference on Computer Vision, pp. 425-433, 1992.
Some techniques try to model the discontinuities and occlusions directly, Intille et al., “Disparity-space images and large occlusion stereo,” ECCV (2), pp. 179-186, 1994, Birchfield et al., “Depth discontinuities by pixel-to-pixel stereo,” International Journal of Computer Vision 35, 3, pp. 269-293, 1999, and Scharstein et al., “A taxonomy and evaluation of dense two-frame stereo correspondence algorithms,” International Journal of Computer Vision, vol. 47(1), pp. 7-42, 2002.
Active illumination methods, which generally give better results, have been used for depth extraction, shape from shading, shape-time stereo, and photometric stereo. Unfortunately, those methods are unstable around depth discontinuities, Sato et al., “Stability issues in recovering illumination distribution from brightness in shadows,” IEEE Conf. on CVPR, pp. 400-407, 2001.
One technique performs logical operations on detected intensity edges in images acquired under widely varying illumination to preserve shape boundaries, Shirai et al., “Extraction of the line drawing of 3-dimensional objects by sequential illumination from several directions,” Pattern Recognition, 4, 4, pp. 345-351, 1972. That technique is also limited to uniform albedo scenes.
With photometric stereo techniques, it is possible to analyze intensity statistics to detect high curvature regions at occluding contours or folds, Huggins et al., “Finding Folds: On the Appearance and Identification of Occlusion,” IEEE Conf. on Computer Vision and Pattern Recognition, IEEE Computer Society, vol. 2, pp. 718-725, 2001. That technique detects regions near occluding contours, but not the contours themselves. That technique assumes that the portion of a surface that is locally smooth. Therefore, that technique fails for a flat foreground object like a leaf or piece of paper, or view-independent edges such as corner of a cube.
Techniques that determine shape from shadow or darkness construct a continuous representation, e.g., a shadowgram, from a moving light source. Continuous depth estimates are possible from the shadowgram, Raviv et al., “Reconstruction of three-dimensional surfaces from two-dimensional binary images,” IEEE Transactions on Robotics and Automation, vol. 5(5), pp. 701-710, 1989, Langer et al., “Space occupancy using multiple shadowimages,” International Conference on Intelligent Robots and Systems, pp. 390-396, 1995, and Daum et al., “On 3-D surface reconstruction using shape from shadows,” CVPR, pp. 461-468, 1998. That technique requires an accurate detection of the start and end of shadows. This makes it difficult to estimate continuous heights.
General reviews of shadow-based shape analysis methods are described by Yang, “Shape from Darkness Under Error,” PhD thesis, Columbia University, 1996, Kriegman et al., “What shadows reveal about object structure,” Journal of the Optical Society of America, pp. 1804-1813, 2001, and Savarese et al., “Shadow Carving,” Proc. of the Int. Conf. on Computer Vision, 2001.
A common limitation of known active illuminations methods is that the light sources need to surround the object, in order to give the image significant shading and shadow variation from estimated or known 3D light positions. This necessitates a fixed lighting rig, which limits the application of these techniques to studio or industrial settings.
It is desired to extract depth edges from images in a manner that is complementary with known methods for determining depth and 3D surface shape, because depth edges often violate smoothness assumptions inherent in many techniques.
If locations of depth discontinuities can be detected reliably and supplied as input, then the performance of many 3D surface reconstruction processes can be significantly enhanced.
It is desired to detect depth edges without solving a correspondence problem or analyzing pixel intensity statistics with moving lights. It is further desired that NPR images can be generated from complex real world scenes where surface normals change rapidly, such as a potted plant, or a scene with high depth complexity or low intensity changes, such as a car engine or a human bone undergoing medical examination.