3D model fitting is a field of engineering in which captured sensor data such as depth images, stereo images, color images and other captured sensor data is observed from a scene depicting one or more objects, and the observed data is fitted to 3D models of the objects. In this way a computer is able to compute a representation of the objects and/or scene which is succinct and yet extremely powerful since it enables the computer to navigate in the scene (robotic control), reason about objects in the scene, overlay virtual objects onto the scene in a way which takes the objects into account, and control user interfaces in dependence on objects in the scene such as human hands and bodies.
As a result of fitting the observed data to the 3D model, values of parameters of the model are computed such as one or more of: orientation, translation, shape and pose. Where the 3D model is articulated the parameters of the model typically include a plurality of joint angles and positions of joints and/or end effectors such as finger tips.
Ground truth data, in the context of 3D model fitting, comprises observed sensor data and corresponding values of the 3D model parameters which are known to be highly accurate. Obtaining such ground truth data is extremely difficult and expensive and is useful for a variety of applications, including evaluation of 3D model fitters, machine learning and applications in the film industry, such as avatar animation, 3D motion capture for green screening and others.
The embodiments described below are not limited to implementations which solve any or all of the disadvantages of known 3D model fitting systems, or known systems for obtaining ground truth data.