Human head pose estimation, or head pose estimation as it is commonly referred to, refers to the extraction of 3-dimensional (3D) information that estimates the location and orientation of a person's head using a sequence of digital images of the head taken from a number of angles. Head pose estimation is a step in several computer vision systems. Also, head pose estimation can provide a natural user interface in many computer applications. By knowing the head orientation and position, a computer application can display information responding to the gaze direction of a human operator. One example is a virtual reality application that can generate a view of the virtual world by tracking the viewpoint of a user. Another example is to use the head pose as input for user interactions, such as selecting text or scrolling a document, which allows physically impaired users to control the computer without using a mouse or a keyboard.
The recent introduction of low cost, commercially available depth sensing cameras makes it possible to generate 3D head models for consumer applications. Depth cameras, as they are referred to herein, provide distance, or depth, images of objects in the field of view of the camera in real-time. By taking multiple depth images of a human head from different directions and computing their head pose data, it is possible to combine the depth images to generate a 3D head model. Examples of commercially available depth cameras are KINECT by MICROSOFT, PRIMESENSE by APPLE COMPUTER, and the BELLUS3D FACE CAMERA.