Three-dimensional (3D) pose estimation determines the location (translation) and angular orientation of an object. Typical, pose estimation methods rely on several cues, such as 2D texture images, and 3D range images. Texture images based methods assume that the texture is invariant to variations of the environment. However, this assumption is not true if there are severe illumination changes or shadows. Specular objects cannot be handled by those methods.
Range images based methods can overcome these difficulties, because they exploit 3D information that is independent of the appearance of objects. However, range acquisition equipment is more expensive than simple cameras.
For some objects, it is very difficult to reconstruct the 3D shape. For example, recovering 3D shape of highly specular objects, such as mirror-like or shiny metallic objects is known to be difficult and unreliable.
Reflection cues are more sensitive to pose changes than texture or range cues. Therefore, exploiting the reflection cues enables pose parameters to be estimated very accurately. However, it is not clear whether the reflection cues are applicable to global pose estimation, i.e., object detection, rather than pose refinement.
Prior art methods are generally based on appearance, which is affected by illumination, shadows, and scale. Therefore it is difficult for those methods to overcome related problems such as partial occlusions, cluttered scenes, and large pose variations. To handle these difficulties, those methods use illumination invariant features, such as points, lines, and silhouettes, or illumination invariant cost functions such as a normalized cross correlation (NCC). However, the object is required to be sufficiently textured. Severe illumination changes can be still problematic, especially for specular objects.
A wide range of methods derive sparse local shape information from the identification and tracking of distorted reflections of light sources, and special known features. Dense measurements can be also obtained using a general framework of light-path triangulation. However, those methods usually need to perform accurate calibration and control of environments surrounding the object, and sometimes require many input images.
Some methods for specular object reconstruction do not require environment calibration. Those methods assume small environmental motion, which induces specular flow on the image plane. In those methods, the specular flow is exploited to simplify the inference of specular shapes in unknown complex lighting. However, a pair of linear partial differential equations has to be solved, and generally, that requires an initial condition, which is not easy to be estimated in real world applications.
One method for estimating the pose based on specular reflection uses a short image sequence and initial pose estimates computed by the standard template matching procedure. Lambertian and specular components are separated for each frame and environment maps are derived from the estimated specular images. Then, the environment maps and the image textures are concurrently aligned to increase the accuracy of the pose refinement process.