A technology has conventionally been proposed in which, during or after shooting of a moving image with sound, sound issued from a desired subject is enhanced to be output. The sound includes a plurality of channels of sounds simultaneously recorded by a plurality of microphones. According to the conventional technology, when a user specifies a desired subject in a displayed image, a directional sound in which the sound issued from the specified subject is enhanced is generated and output. It is required that information on the focal length of an imaging apparatus at the time of shooting and information on the arrangement of the plurality of microphones (microphone-to-microphone distance) are known in advance.
In accordance with the universal prevalence of imaging apparatuses such as home movie cameras for shooting a moving image with stereo sound, huge amounts of data on moving images with sound that are shot by such imaging apparatuses are available, and demands for replay are ever on the increase. In many of these moving images with sound, the information on the focal length of the imaging apparatus at the time of shooting and the information on the microphone-to-microphone distance are unknown.
The conventional technology requires that the information on the focal length of the imaging apparatus at the time of shooting and the information on the microphone-to-microphone distance are known in advance. Thus, sound issued from a desired subject when replaying a moving image with sound, in which the information on the focal length of the imaging apparatus at the time of shooting and the information on the microphone-to-microphone distance are unknown, cannot be enhanced to be output.