A video see-through head mounted display (VST-HMD) is known as a device to realize a visual guidance of a physical motion. A VST-HMD is typically configured to display video from a first person perspective at a display screen mounted at a head, and when two persons including a wearer and another person as a model sharing the view in the display screen perform a cooperative physical motion, first-person perspective videos of the other person and the wearer are synthesized at the same time for presentation for learning of the physical motion and cooperation thereof (Non-Patent Literature 1). Non-Patent Literature 2 discloses an image processing technique to use skeletal animation as guidance video on the expert side, configured to convert this skeletal animation into a first person perspective video to be shared with the wearer's perspective image for synthesized display.