Applications such as immersive telecommunication, games, tele-presence and those providing virtual three-dimensional (3D) environments with realistic motion parallax need a reasonably faithful reconstruction of the human subject. Multi-view stereo cameras have been widely used to reconstruct various real world objects, including human subjects for such applications. However, reconstruction of hair (e.g., human hair) remains a challenging task due to the many distinct characteristics of hair. For instance, omnipresent occlusions and complex strand geometry preclude general surface-based smoothness priors for hair reconstruction. The highly specular nature of hair also violates the Lambertian surface assumption employed in most multi-view stereo methods.
As a result, many practical systems have either completely avoided hair reconstruction during facial capture, or relied on manual input to achieve plausible results. Attempts to facilitate hair capture using specialized hardware, such as a fixed camera with moving light sources, a stage-mounted camera with macro lens, thermal imaging and so forth have been made. However, these mechanisms are generally costly, and require lengthy capture sessions that limit their applicability to hair that only stays static over time.
An alternative approach is to deploy dense camera arrays that have small baselines, e.g., separated by angles on the order of fifteen degrees or so depending on the number of cameras. That is, to capture complete full-head hairstyles, it is typical to have twenty to thirty camera views. Due to the complex hardware setup, it is challenging to adopt this many cameras in real-world systems and applications.