Recently, a type of depth camera has been developed for use as a natural input device to a computing device, such as a gaming console. The depth camera captures depth images of one or more users moving within an interaction space. The depth images are processed by the computing device and a wireframe skeleton of each user is generated. As the user moves in the interaction space, real-time skeletal tracking of the user is performed based on a stream of depth images received at the computing device. The skeletal tracking may be used to display an animated figure, such as a player character or avatar, on a display of the computing device, which moves as the user moves in the interaction space.
One drawback of such systems is that they are currently limited to displaying animated figures for which a polygonal mesh has already been stored. Thus, pre-stored player characters, avatars, etc., may be controlled by the natural input methods described above, but an animation of the user himself or other objects in the interaction space cannot be generated based on the depth images. Another challenge with such systems is that image data cannot be exchanged quickly enough over a computer network such as the Internet, to enable remote players using different computing devices to interact with each other via animated figures of suitable quality displayed to each user in real time, in the same virtual space on each of the computing devices.