As high-speed Internet access using such means as ADSL and cable modems becomes increasingly common, users at home have greater opportunity to download over the Internet video images, such as television programs. In addition, interfaces between digital cameras and personal computers (PCs) have become more common, and users now commonly download video images they themselves shot into their PCs for processing.
Further, as PC functions improve and hard disk capacity increases, large amounts of data can easily be stored on a hard disk, replayed and displayed; and an environment in which a plurality of video images are displayed on a display device can be easily accommodated. To present such two or more video information, it is possible to arrange two or more windows in arbitrary positions, and to display these video information simultaneously on a display, such as a CRT or a liquid crystal display.
When a plurality of video images are thus displayed, the audio data accompanying such video images is handled either by outputting none of the audio data or by synthesizing and outputting all the audio data at the same volume. When audio is not output, this audio data cannot be used for apprehending the content of a video image; as there is no audio to assist in apprehending the content of the video image, a user is forced to make a determination about video image content relying solely on the images displayed on the display device. When all the audio data is synthesized at the same volume, the various audio data interfere with each other, and are difficult to hear; it is also difficult to tell which audio belongs to which video image.
It is an object of the present invention to facilitate recognition of which audio data corresponds to which video image, and to enable a user to easily apprehend the content of video image being displayed.