The background pertaining to the present invention is described in brief as follows:
I. Hairstyle Modeling of Virtual Individuals
Although in practical applications in the industry, a lot of softwares are available for assisting creators to accomplish manual virtual hairstyle modeling, those softwares are often complex and time-consuming, and require proficient skills and complicated manual manipulations, thereby prolonging the creating period of products and increasing costs (WARD, K., BERTAILS, F., KIM, T. -Y., MARSCHNER, S. R., CANI, M. -P., AND LIN, M. C. 2007. A survey on hair modeling: styling, simulation, and rendering. IEEE Transactions on Visualization and Computer Graphics 13, 2, 213-234.). Therefore, on this basis, some prior methods try to capture real hairstyle images to synthesize realer hairstyle model and decrease workloads, however, a majority of such image-based modeling methods may obtain an enough real 3D hairstyle structure when a plurality of images under different illumination, different viewpoints or different focal distances are captured (PARIS, S., BRICE{tilde over ( )}NO, H., AND SILLION, F. 2004. Capture of hair geometry from multiple images. ACM Trans. Graph. 23, 3, 712-719.; WEI, Y., OFEK, E., QUAN, L., AND SHUM, H. -Y. 2005. Modeling hair from multiple views. ACM Trans. Graph. 24, 3, 816-820.; PARIS, S., CHANG, W., KOZHUSHNYAN, O. I., JAROSZ, W., MATUSIK, W., ZWICKER, M., AND DURAND, F. 2008. Hair photobooth: geometric and photometric acquisition of real hairstyles. ACM Trans. Graph. 27, 3, 30:1-30:9.; JAKOB, W., MOON, J. T., AND MARSCHNER, S. 2009. Capturing hair assemblies fiber by fiber. ACM Trans. Graph. 28, 5, 164:1-164:9.).
Bonneel et al. have proposed an estimation method for hairstyle appearance, which is used to obtain a statistically analogous 3D hairstyle model from a single image shot by an indoor flashlight (BONNEEL, N., PARIS, S., PANNE, M. V. D., DURAND, F., AND DRETTAKIS, G. 2009. Single photo estimation of hair appearance. Computer Graphics Forum 28, 1171-1180.), but the hairstyle model synthesized by this method cannot fit an individual hairstyle in the original image on pixel level, which is thereby not applicable in image hairstyle editing applications. Luo et al. have found that strand orientation features are usually more reliable than pixel colors of original images, so orientation information is used to estimate more accurate hairstyle volume in multi-view hairstyle modeling (LUO, L., LI, H., PARIS, S., WEISE, T., PAULY, M., AND RUSINKIEWICZ, S. 2012. Multi-view hair capture using orientation fields. In Proc. CVPR 2012.), however, the hairstyle generated by adopting this method is represented in polygon grids, which does not conform to the quality requirements of hairstyle model in the digital media industry. Beeler et al. have proposed a method for simultaneously capturing sparse facial hair and occluded skin surface (BEELER, T., BICKEL, B., NORIS, G., BEARDSLEY, P., MARSCHNER, S., SUMNER, R. W., AND GROSS, M. 2012. Coupled 3D reconstruction of sparse facial hair and skin. ACM Trans. Graph. 31, 4.), the high quality modeling result obtained in this method has proved the importance of hairs in portraying the authenticity of virtual characters, however, this method can merely be applied to sparse and shorter hairs on faces, such as beards and eyebrows, but cannot be applied to common hairs.
Comparing with static hairstyle modeling, dynamic hairstyle modeling for a video is more complex, which is still an unsolved problem in the art. Yamaguchi et al. have extended the multi-view hairstyle modeling method (WEI, Y., OFEK, E., QUAN, L., AND SHUM, H. -Y. 2005. Modeling hair from multiple views. ACM Trans. Graph. 24, 3, 816-820.) directly to process the video (YAMAGUCHI, T., WILBURN, B., AND OFEK, E. 2009. Video based modeling of dynamic hair. In the 3rd Pacific Rim Symposium on Advances in Image and Video Technology, 585-596.). Zhang et al. have further used physics-based movement simulation to refine initially reconstructed hairstyle model sequence. The dynamic hairstyle model obtained by combining multiple viewpoints in these methods would over restraint the movement of the hairstyle for lack of accurate correspondence of strands, which thereby leads to the result of excessive smoothness and sever deviation from real situations.
II. Image Editing under the Guidance of 3D Information
In digital image editing applications directed to individual entertainment industry, a user's input is usually an arbitrary single view. The reconstruction of an accurate 3D model from the single view is an uncertain problem due to the lack of accurate 3D information, however, it has been proved by some prior methods that the applicable range of possible image editing manipulations is widely extended by adopting proper certain 3D proxies (proxy). Differing from usual image-based modeling techniques, these 3D proxies are always designed to better portray and represent mutual structures for certain types of objects, but may not fit shapes of real objects accurately.
Aiming at common scenario images, prior methods adopted in the industry include a layered depth map (OH, B. M., CHEN, M., DORSEY, J., AND DURAND, F. 2001. Image-based modeling and photo editing. In Proc. SIGGRAPH, 433-442.), image pop-up using plane billboard constructions (HOIEM, D., EFROS, A. A., AND HEBERT, M. 2005. Automatic photo pop-up. ACM Trans. Graph. 24, 3, 577-584.), synthetic object insertion with cuboid proxies (KARSCH, K., HEDAU, V., FORSYTH, D., AND HOIEM, D. 2011. Rendering synthetic object into legacy photographs. ACM Trans. Graph. 30, 6, 157:1-12.), and semantic editing of existed object in image (ZHENG, Y., CHEN, X., CHENG, M. -M., ZHOU, K., HU, S. -M., AND MITRA, N. J. 2012. Interactive images: Cuboid proxies for smart image manipulation. ACM Trans. Graph. 31, 4.), the 3D proxies of these methods for roughly representing general 3D information of the scenario, which are usually simple and always have fairly limited effects, are usually applied to film pop-up processing and etc.
There are more effective 3D proxy methods for certain human body objects in images. Blanz and Vetter have proposed a method for fitting a parameterized morphable head model (morphable head model) using human face regions in images (BLANZ, V., AND VETTER, T. 1999. A morphable model for the synthesis of 3D faces. In Proc. SIGGRAPH '99.), this 3D human face model with low parametric dimension may be applied to realer expression transfer (expression transfer)(YANG, F., WANG, J., SHECHTMAN, E., BOURDEV, L., AND METAXAS, D. 2011. Expression flow for 3D-aware face component transfer. ACM Trans. Graph. 30, 4, 60:1-60:10.) and face swapping (BITOUK, D., KUMAR, N., DHILLON, S., BELHUMEUR, P. N., AND NAYAR, S. K. 2008. Face Swapping: Automatically replacing faces in photographs. ACM Trans. Graph. 27, 39:1-39:8.; DALE, K., SUNKAVALLI, K., JOHNSON, M. K., VLASIC, D., MATUSIK, W., AND PFISTER, H. 2011. Video face replacement. ACM Trans. Graph. 30, 6, 130:1-130:10.). Similarly, parametric reshaping of human bodies in images and videos can be realized by adopting 3D morphable head models of full bodies (ZHOU, S., FU, H., LIU, L., COHEN-OR, D., AND HAN, X. 2010. Parametric reshaping of human bodies in images. ACM Trans. Graph. 29, 4, 126:1-126:10.; JAIN, A., THORM{umlaut over ( )}AHLEN, T., SEIDEL, H. -P., AND THEOBALT, C. 2010. MovieReshape: Tracking and reshaping of humans in videos. ACM Trans. Graph. 29, 6, 148:1-148:10.). These methods may be applied to special effect making in films and image editing and processing and etc. However, what should be noted is that, there are usually more apparent common features in human faces, bodies and other parts, whose parametric fitting is much easier, than in hairstyles.
Chai et al. have recently proposed a single-view 3D hairstyle reconstruction method (CHAI, M., WANG, L., YU, Y., WENG, Y., GUO, B., AND ZHOU, K. 2012. Single-view hair modeling for portrait manipulation. ACM Trans. Graph. 31, 4, 116:1-8.) aiming at 3D proxies of hairstyles. However, this method ,which cannot generate a 3D hairstyle model with physical rationality, merely fits hairstyle regions in original images according to numerous spatial strands, and thereby results of this method are difficult to be directly applied to related applications of the industry.