Some home entertainment and gaming systems provide a natural user interface in which a system may be controlled using gestures or spoken commands. These systems may include a color camera (e.g., an RGB camera) and depth camera pair for capturing images of a scene, such as a playspace, in order to sense motion and identify gestures. The depth camera may comprise an active illumination depth camera that utilizes time-of-flight (TOF) or structured light techniques for obtaining depth information. The color camera may capture the scene as a color image and the depth camera may capture the scene as a depth map. A depth map may comprise a two-dimensional image of an environment that includes depth information relating to the distances to objects within the environment from a particular reference point, such as a point associated with the depth camera. Each pixel in the two-dimensional image may be associated with a depth value representing a linear distance from the particular reference point.
Various computer vision techniques including gesture recognition, object recognition, 3D scene reconstruction, and image based rendering may register (or align) color information from a color image with depth information from a depth map. The registration of a color camera with a depth camera may include determining the relative pose between the two cameras using a planar checkerboard pattern placed within an environment as an optical target for aligning feature points. As the depth camera may simultaneously produce an intensity image (e.g., an IR light intensity image) and a depth map, the registration process may include mapping features points based on color discontinuities within the intensity image.