Some video games involve capturing an image of a user's body and markers with a camera and having relevant regions of the captured image replaced with another image for display on a display device (refer to PTL 1, for example). Also known are user interface systems by which the movements of the mouth and hands captured with a camera are interpreted as instructions to operate an application. Such technology for capturing the real world in order to display a virtual world reacting to imaged movements in the real world or to perform some kind of information processing on the images has been used in diverse fields ranging from mobile terminals to leisure facilities regardless of the scale.