As Internet access lines are growing in speed and band, video communication services are expected to be more popular, which transfer video media including videos and audios between terminals or between servers and terminals via the Internet.
The Internet is a network which does not necessarily guarantee the communication quality. When performing communication using audio and video media, the bit rate drops if the network line band is narrow between user terminals, or a packet loss or packet transfer delay occurs if line congestion occurs. This leads to poor quality of audio and video media sensed by users (QoE: (Quality of Experience)).
More specifically, when a video is encoded, a block of video signals within a frame may degrade, or the high frequency component of a video signal is lost, impairing the resolution of the entire video. When encoded video contents are packetized and transmitted from a provider, a packet loss or packet transfer delay occurs within the network or terminal equipment, and the video suffers an unintended degradation.
As a result, the user perceives a blur, smear, mosaic-shaped distortion, or freeze (state in which a video frame stops) or skip (state in which several frames are lost from video frames) of video frames.
To confirm that video communication services as mentioned above are provided at high quality, it is important to measure the QoE of a video and manage the quality of the video to be provided to the user while providing services.
Therefore, a video quality assessment technique capable of appropriately representing the QoE of a video is required.
As conventional methods for assessing the qualities of videos and audios, there are a subjective quality assessment method (non-patent literature 1) and an objective quality assessment method (non-patent literature 2).
In the subjective quality assessment method, a plurality of users actually view videos and listen to audios, and assess the QoE using a quality scale of five grades (excellent, good, fair, poor, and bad) (nine or 11 grades are also available) or an impairment scale (imperceptible, perceptible but not annoying, slightly annoying, annoying, and very annoying). Video or audio quality assessment values under each condition (e.g., packet loss rate of 0% and bit rate of 20 Mbps) are averaged by the total number of users. The average value is defined as an MOS (Mean Opinion Score) value or DMOS (Degradation Mean Opinion Score) value.
However, the subjective quality assessment method requires special dedicated equipment (e.g., monitor) and an assessment facility capable of adjusting the assessment environment (e.g., room illuminance or room noise). In addition, many users need to actually assess videos or audios. Since time is taken till the completion of actual assessment by users, the subjective quality assessment method is not adequate to assess the quality in real time.
This boosts a demand for development of an objective quality assessment method of outputting a video quality assessment value using a feature amount (e.g., bit rate, bit amount per frame, or packet loss information) which affects video quality.
One conventional objective quality assessment method detects quality degradation caused by encoding of a video, and estimates the individual video quality value or average video quality value of the video (non-patent literature 2).
The individual video quality value is the quality assessment value of each video content to be estimated, and is defined by a value of 1 to 5 (in some cases, defined by another range of, e.g., 1 to 9 or 0 to 100). The average video quality value is a value obtained by dividing the sum of the individual video quality values of respective video contents to be estimated by the total number of video contents to be estimated, and is defined by a value of 1 to 5 (in some cases, defined by another range of, for example, 1 to 9 or 0 to 100).
For example, when the number of videos (a plurality of transmitted videos will be called a “video subset”) transmitted under the same condition (packet loss rate of 0% and bit rate of 20 Mbps) in an arbitrary video content (video set) is eight, the quality assessment values of the eight respective videos contained in the video subset are individual video quality values, and a value obtained by dividing the sum of the individual video quality values of the video subset by eight, which is the number of videos contained in the video subset, is the average video quality value.
FIG. 8 is a view for conceptually explaining the relationship between the video set and the video subset. As shown in FIG. 8, the video subset means a specific video set used for video quality assessment cut of a video set serving as a set containing an infinite number of videos, i.e., a set of arbitrary videos.
There is also known a conventional objective quality assessment method of detecting quality degradation caused by video encoding or packet loss degradation, and estimating the video quality assessment value of the video (non-patent literature 3 and patent literature 1). The video quality assessment value indicates the quality assessment value of each video content to be estimated, and is defined by a value of 1 to 5 (as described in the description of the subjective quality assessment method, 9- or 11-grade assessment may be adopted, and the quality assessment value may be designated by another range of, e.g., 1 to 9 or 0 to 100).
As described above, most conventional subjective quality assessment methods estimate a video quality assessment value using packets or video signals (pixel values). Non-patent literature 2 and patent literature 1 describe techniques for estimating a video quality assessment value from only header information of packets. Non-patent literature 3 describes a technique for estimating a video quality assessment value from video signals.
The relationship between the video frame type and the GoP (Group of Picture) structure of an encoded video when transmitting compressed video frames, and the relationship between the video frame type and the quality assessment value of an encoded video will be explained.
<Video Frame Type>
Compressed video frames are classified into three types: I-frame (Intra-frame), P-frame (Predictive-frame), and B-frame (Bi-directional frame).
The I-frame is a frame which is independently encoded within it regardless of preceding and succeeding frames. The P-frame is a frame which is predicted from a past frame within consecutive frames, i.e., encoded by forward prediction. The B-frame is a frame which is encoded by prediction from past and future frames in two directions.
<Relationship Between GoP Structure and Video Frame Type>
The GoP structure of an encoded video represents the interval at which video frames of the respective video frame types are arranged.
For example, FIG. 24 is a view for conceptually explaining a GoP structure represented by M=3 and N=15 (M is an interval corresponding to the number of frames in one-way prediction, and N is the interval between I-frames).
In an encoded video having the GoP structure as shown in FIG. 24, two B-frames are inserted between an I-frame and a P-frame and between P-frames, and the interval between I-frames is 15 frames.
<Bit Amounts of Respective Video Frame Types>
The Bit amounts of compressed video frames of the respective video frame types will be explained.
The bit amounts of video frames of the respective video frame types are defined as the I-frame bit amount (BitsI), P-frame bit amount (BitsP), and B-frame bit amount (BitsB). The bit amounts of the respective video frame types are indices indicating bit amounts used for the respective video frame types (I-, B-, and P-frames) when, for example, encoding a 10-sec video content to be assessed.
More specifically, when a 10-sec video content is encoded at 30 fps (frames/second), the total number of video frames of an encoded video is 300, and 20 I-frames exist in all the 300 frames. Assuming that the bit amount necessary to encode the 20 I-frames is 10,000 bits, the I-frame bit amount is 500 bits/I-frame from 10,000 bits/20 I-frames.
Similarly, 80 P-frames exist in all the 300 frames. Assuming that the bit amount necessary to encode the 80 P-frames is 8,000 bits, the P-frame bit amount is 100 bits/P-frame from 8,000 bits/80 P-frames. Also, 200 B-frames exist in all the 300 frames. Assuming that the bit amount necessary to encode the 200 B-frames is 10,000 bits, the B-frame bit amount is 50 bits/B-frame (10,000 bits/200 B-frames).
At this time, the 28,000-bit amount is necessary to encode the 10-sec video content (300 frames in total), so the bit rate is 2,800 b/s (2.8 kbps) from 28,000 bits/10 seconds.
<Bit Amount Characteristics for Respective Video Frame Type>
The frame maximum bit amount, frame minimum bit amount, and frame average bit amount indicating bit amount characteristics for the respective video frame types will be defined and explained.
The maximum value of the frame bit amount is defined as the frame maximum bit amount, the minimum value is defined as the frame minimum bit amount, and the average value is defined as the frame average bit amount with respect to the bit rate (BR) or the number of lost video frames (DF) in a plurality of video contents (for example, a video set of eight video contents). In correspondence with the respective video frame types, these values are represented by the I-frame maximum bit amount (BitsImax), I-frame minimum bit amount (BitsImin), I-frame average bit amount (BitsIave), P-frame maximum bit amount (BitsPmax), P-frame minimum bit amount (BitsPmin), P-frame average bit amount (BitsPave), B-frame maximum bit amount (BitsBmax), B-frame minimum bit amount (BitsBmin), and B-frame average bit amount (BitsBave).
For example, the I-frame bit amounts of the eight video contents encoded at the same bit rate are “450 bits”, “460 bits”, “470 bits”, “480 bits”, “490 bits”, “500 bits”, “510 bits”, and “520 bits”, respectively. In this case, since the maximum value of the I-frame bit amount is “520 bits”, the I-frame maximum bit amount is “520”. Since the minimum value of the I-frame bit amount is “450 bits”, the I-frame minimum bit amount is “450”. Since the average value of the I-frame bit amount is “485 bits”, the I-frame average bit amount is “485”.
As for the frame maximum bit amounts, frame minimum bit amounts, and frame average bit amounts of B- and P-frames, the maximum values, minimum values, and average values of the frame bit amounts of the respective video frame types are defined as the frame maximum bit amounts, frame minimum bit amounts, and frame average bit amounts with respect to the bit rate (BR) or the number of lost video frames (DF) in a plurality of video contents.
<Bit Amounts of Respective Video Frame Types and Influence on Video Quality>
The influence of bit amounts assigned to the respective video frame types on video quality in video encoding will be explained with reference to the accompanying drawings.
FIGS. 9A to 9C are graphs showing the bit amounts of the respective video frame types (I-, P-, and B-frames) of a video to undergo video quality estimation that are plotted along the abscissa, and the video quality values of respective video contents that are plotted along the ordinate when video contents for predetermined seconds are encoded at the same bit rate (in this example, 10-sec video contents at 10 Mbps with 300 video frames).
As shown in FIGS. 9A to 9C, the relationship between the bit amounts of the respective video frame types and the video quality assessment value represents that, as a result of comparison at the same bit rate, a video content having a small I-frame bit amount exhibits a low video quality assessment value and a video content having a large I-frame bit amount exhibits a high video quality assessment value. The result of comparison at the same bit rate for the P- and B-frame bit amounts reveals that video contents having small P- and B-frame bit amounts exhibit high video quality assessment values and video contents having large P- and B-frame bit amounts exhibit low video quality assessment values.
Even in videos having the same bit rate, the bit amounts of the respective video frame types affect video quality.
<Relationship between Bit Amount Characteristics of Respective Video Frame Types and Video Quality>
FIGS. 10A and 10B are graphs conceptually showing the relationship between the bit rate of each video in a video subset and the frame bit amounts of the respective video frame types. The relationship between the bit rate and the P-frame bit amount shown in FIG. 10B is similar to that between the bit rate and the B-frame bit rate, so the relationship between the bit rate and the B-frame bit rate will not be illustrated.
As shown in FIGS. 10A and 10B, depending on the video, the frame bit amount has different characteristics even in videos having the same bit rate. More specifically, even if videos have the same bit rate, the relationship between the frame maximum bit amount, the frame minimum bit amount, and the frame average bit amount differs between the respective video frame types.
The relationship between the bit rate of a video and the frame bit amounts of the respective video frame types affects video quality. The video quality differs between even videos having the same bit rate.
FIG. 11 is a graph for conceptually explaining the above-mentioned influence of the bit amounts of the respective video frame types on video quality.
FIG. 11 shows the relationship between the bit rate and the video quality value. In FIG. 11, circles, triangles, and squares respectively represent a maximum video quality value (Vqmax) which is maximum among the video quality values of videos having the same bit rate out of videos in a video subset, a minimum video quality value (Vqmin) which is minimum, and an average video quality value (Vqave) which is a value obtained by dividing the sum of video quality values by the number of videos.
As shown in FIG. 11, the video quality val has a difference between the maximum video quality value and the minimum video quality value even in videos having the same bit rate. That is, the video quality value of a video to be estimated does not always coincide with the average video quality value of a video having the same bit rate as that of the video to be estimated. The difference between the video quality value and the average video quality value depends on bit amounts assigned to the respective video frame types of the video to be estimated. This difference between the video quality value and the average video quality value is defined as a difference video quality value (dVq).
Hence, the difference video quality value (dVq) is generated in videos having the same bit rate depending on the relationship between the bit rate of a target video and the characteristics of bit amounts assigned to the respective video frame types.