Video data transmitted by a conventional video transmission system is normally compression-coded into a fixed bit rate or below by means of a method such as H.261 or MPEG (Moving Picture Experts Group) so that it can be transmitted in a fixed transmission bit rate, and once video data has been coded the video quality cannot be changed even if the transmission bit rate changes.
However, with the diversification of networks in recent years, transmission path bit rate fluctuations have become large, and video data has become necessary that enables video transmission at quality commensurate with a plurality of bit rates. In response to this need, layered coding methods have been standardized that have a layered structure and can handle a plurality of bit rates. Among such layered coding methods, recently standardized MPEG-4 FGS (Fine Granularity Scalability) in particular, is a layered coding method with a high degree of freedom regarding bit rate selection. Video data coded by MPEG-4 FGS is composed of a base layer, which is a moving picture stream for which stand-alone decoding is possible, and at least one enhancement layer, which is a moving picture stream for improving base layer decoded moving picture quality. The base layer is video data of low picture quality in a low bit rate, and a high degree of flexibility in achieving high picture quality is possible by matching the enhancement layer to the bit rate.
In MPEG-4 FGS, the total data size of an enhancement layer to be transmitted can be controlled to allow application to a variety of bit rates, and it is possible to transmit video of quality that is in accordance with the bit rate.
However, when video is received simultaneously by a plurality of terminals, since the capabilities of terminals receiving the video and the characteristics (reception area) of the network used by each terminal differ, there is a problem in that the quality deemed to be necessary for each terminal is different. Here, “quality” includes, for example, image quality, frame rate (smoothness of motion), error resilience, spatial resolution, delay, processing complexity, and so forth.
With regard to this point, a technology for performing video transmission at appropriate quality according to network characteristics for a terminal for which the network status fluctuates due to network congestion or the like is one that divides a video stream into layers and transmits these layers on channels with different priorities (see, for example, page 1 and FIG. 1 of Unexamined Japanese Patent Publication No. HEI 4-100494).
FIG. 1 is a drawing showing an example of the configuration of a conventional video coding apparatus. In this video coding apparatus 10, a video coding circuit 12 performs video coding on an input video signal using motion compensation, DCT (discrete cosine transform), and quantization, and outputs coded data to a layering circuit 14. Layering circuit 14 divides coded data input from video coding circuit 12 into M areas in N×N-pixel block units used in DCT (where N and M are both natural numbers), and outputs M coded data to a packetizing buffer 16. In the data division by layering circuit 14, image quality degradation that occurs when a particular area among the M areas is discarded is measured, and the area size is controlled so that this degradation becomes equal to a permitted value set beforehand. With respect to divided coded data input from layering circuit 14, packetizing buffer 16 transmits an area containing a low-frequency component on a high-priority channel.
Thus, in video coding apparatus 10, by subjecting DCT coefficients to area division, and transmitting from a low-frequency component on a high-priority channel, video can be received with image quality in accordance with a bit rate even when network congestion occurs since coded data in a low-priority channel is discarded.
However, with the above-described conventional technology, although it is possible to receive video with image quality in accordance with the transmission bit rate, a plurality of terminals cannot individually freely select the type of quality (such as image quality, smoothness of motion, error resilience, spatial resolution, or processing complexity as described above, for example) to be given priority.
For example, within a limited transmission bit rate, it is preferable for a terminal with a large display screen to receive video with priority given to high image quality of successive images even if motion is not smooth, rather than video with smooth motion but low image quality. Conversely, in the case of a terminal with a small display screen, it is preferable to receive video of low image quality but with smooth motion rather than video of high image quality but lacking smooth motion. Also, in the case of a terminal in a radio environment in which there is a high error rate on the network, it is preferable to receive video with high error resilience that can be played back under conditions in which errors occur, rather than video with low error resilience that can only be played back under error-free conditions.
Thus, when video is received simultaneously by a plurality of terminals whose characteristics (terminal performance or reception bit rate) differ, it is deemed necessary for a terminal to be able to freely select the type and level of video quality, and to be able to receive video of appropriate quality according to the terminal's characteristics and conditions.