With advances in high-speed and broadband access lines to the Internet, attention has been paid to bidirectional multi-modal services based on the composition of audio and video media such as video communication services, e.g., video telephone/conference services, and collaboration services.
The Internet used for such services is a network which does not always guarantee the communication quality. For this reason, in communicating audio and video media upon compositing them, if the operating band of a communication line connecting user terminals is narrow or congestion occurs in the network, the quality of user experience of audio information and video information which the user actually feels on the receiving terminal degrades.
More specifically, the occurrence of quality degradation in audio information is perceived as breaks, noise, and a response delay, and the occurrence of quality degradation in video information is perceived as phenomena such as defocus, blur, mosaic-shaped distortion, jerky effect, and a response delay.
In some cases, a user perceives a response delay caused by the synchronization between audio and video media, i.e., a video communication service response delay, the step-out between audio and video media which is caused by a response time delay offset, or the like due to the processing time taken for the transmission of audio and video media signals, a delay time in the network, and a delay time originating from the processing time taken for the reception of audio and video media signals. In this case, the processing time taken for the transmission of audio and video media signals includes the processing time taken to encode audio and video media, a transmission buffer time, and the like. Delay times in the network include the processing time taken by a router and the like constituting a network, the time based on the physical distance between the networks used by communicators, and the like. In addition, the processing time taken for the reception of audio and video media signals includes a reception buffer time, the decoding time for audio and video media, and the like.
In order to provide such services with high quality, importance is placed on quality design before service provision and quality management after the start of a service. It is therefore necessary to develop a simple and efficient quality evaluation technique capable of appropriately expressing video quality enjoyed by users.
As an audio quality estimation technique, ITU-T recommendation P.862 (International Telecommunication Union-Telecommunication Standardization Sector) defines an objective audio quality evaluation scale PESQ (Perceptual Evaluation of Speech Quality). On the other hand, as a video quality estimation technique, an objective video quality evaluation scale is described in ITU-T recommendation J.144 and the like. There are continuing discussions about this subject in VQEG (Video Quality Experts Group) and the like (see, for example, www.its.bldrdoc.gov/vqeg/).
These objective quality evaluation techniques allow to estimate subjective quality with an estimation error equivalent to the statistical ambiguity of subjective quality under a predetermined condition. Under the circumstances, the present inventors have proposed a technique of obtaining a comprehensive quality evaluation scale for bidirectional multi-modal services such as video telephone/conference services in consideration of the quality of an individual medium such as an audio medium or video medium, which is obtained by the above objective quality evaluation scale or a subject quality evaluation experiment, and the influence of the transmission delay time of each medium (see, for example, Japanese Patent Laid-Open No. 2005-244321).