A ‘video signal’ consists of a sequence of images. Each image is referred to as a ‘frame’. When a video signal Is transmitted from one location to another, it is typically transmitted as a sequence of pictures. Each frame may be sent as a single picture, however the system may need to send more than one picture to transmit all the information in one frame.
Increasingly, video signals are being transmitted over radio communication links. This transmission may be over a communication path of very limited bandwidth, for example over a communication channel between a portable or mobile radio device and a base station of a cellular communications system.
One method of reducing the bandwidth required for transmission of video is to perform particular processing of the video signal prior to transmission. However, the quality of a video signal can be affected during coding or compression of the video signal. For this reason, methods have been developed to enhance the quality of the received signal following decoding and/or decompression.
It is known, for example, to include additional ‘layers’ of transmission, beyond simply the base layer in which pictures are transmitted. The basic video signal is transmitted in the base layer. The additional layers are termed ‘enhancement layers’. The enhancement layers contain sequences of pictures that are transmitted in addition to the basic set of pictures. These additional pictures are then used by a receiver to improve the quality of the video. The pictures transmitted in the enhancement layers may be based on the difference between the actual video signal and the video bit stream after it has been encoded by the transmitter.
The base layer of video transmission typically contains two types of picture. The first is an ‘Intracoded’ picture, which is often termed an I-picture. The important feature of an I-picture is that it contains all the information required for a receiver to display the current frame of the video sequence. When it receives an I-picture, the receiver can display the frame without using any data about the video sequence that it has received previously.
A P-picture contains data about the differences between one frame of the video sequence and a previous frame. Thus a P-picture constitutes an ‘update’. When it receives a P-picture, a receiver displays a frame that is based on both the P-picture and data that it already holds about the video stream from previously received pictures.
If a video system employs one or more enhancement layers, then it can send a variety of different types of picture in the enhancement layer. One of these types is a ‘B-picture’. A ‘B-picture’ differs from both I- and P-pictures. A ‘B-picture’ is predicted based on information from both a picture that precedes the B-picture in time in the video stream and one that follows it. The B-picture is said to be ‘bi-directionally predicted’.
A B-picture is predicted based on pictures from the layer below it. Thus a system with a base layer and a single enhancement layer will predict ‘B-pictures’ based on earlier and later pictures in the base layer, and transmit these B-pictures in the enhancement layer. A notable feature of B-pictures is that they are disposable—the receiver does not have to have them in order to display the video sequence. In this sense they differ from P-pictures, which are also predicted, but are necessary for the receiver to reconstruct the video sequence. A further difference lies in the fact that B-pictures cannot serve as the basis for predicting further pictures.
The pictures transmitted in the enhancement layers are an optional enhancement, since the transmission scheme always allows a receiver to reconstruct the transmitted video stream using only the pictures contained in the base layer. However, any systems that have sufficient transmission bandwidth can be arranged to use these enhancement layers.
This hierarchy of base-layer pictures and enhancement pictures, partitioned into one or more layers, is referred to as a layered scalable video bit stream.
Consider as an example a temporal scalable video bit stream, with a base layer made up of an intra-coded picture (I picture) followed by inter-coded pictures (P pictures) predicted from the previous I or P picture. A temporal enhancement layer contains additional pictures inserted between the P pictures, to increase the overall frame rate of the sequence. Since it must be possible to decode the sequence at the base layer without these additional pictures, they must be coded as bi-directionally predicted pictures (B pictures), bi-directionally predicted from the previous and subsequent I or P pictures, so that they are disposable. The more bits that are allocated to each B picture, the better quality each B picture will be, in terms of peak signal-to-noise ratio (PSNR). However, the more bits that are allocated to each B picture, the fewer of them can be encoded, due to the layer having a fixed bandwidth, thus the lower the frame rate of the sequence.
An illustration of the picture prediction dependencies is shown in FIG. 1 hereinafter.
A prior art video transmission arrangement relevant to the above example is known from “H.263 Scalable Video Coding and Transmission at Very Low Bit Rates”, PhD Dissertation, Faisal Ishtiaq, Northwestern University, Illinois, U.S.A., December 1999. From this arrangement, it is known to use a rate control algorithm for the base layer to determine quantisation parameters (and hence the PSNR quality of the P pictures) and number of source pictures to drop (and hence the frame rate). Similarly, the temporal enhancement layer uses a second video buffer and runs a similar rate control algorithm, with the first B picture being placed halfway between the I and the first P picture. The temporal placement of subsequent B pictures is determined by the enhancement layer rate control independently of the P pictures. Therefore the case can arise where the enhancement layer rate control wants to encode a B picture at the same temporal position as a P picture already encoded in the base layer. When this occurs, the B picture is simply encoded in the previous temporal position instead. There is no thought given to the regular distribution of the B pictures in time with respect to the P pictures. An example of temporal placement of I/P pictures in a base layer and B pictures in an enhancement layer according to the prior art method is shown in FIG. 2 hereinafter.
Since the enhancement layer rate control is run separately from the base layer rate control, this prior art method does not take into account the fact that a high bandwidth viewer sees both the base layer P pictures and the enhancement layer B pictures together as one sequence. These P pictures and B pictures may have significantly different PSNRs, thereby resulting in the viewer being presented with pictures of markedly different spatial quality. This makes the lower spatial quality P pictures particularly noticeable.
The limitations of this prior art approach are particularly apparent when the base layer has a low bit rate, so that P pictures are spaced far apart, and the temporal enhancement layer has a much higher bit rate. This means that there is an abundance of bits available to encode B pictures, so they are of much higher PSNR than the P pictures. A difference in spatial quality between the P pictures and the B pictures is then very apparent to the high bandwidth viewer when the whole sequence is played, as is shown in FIGS. 3 and 4 hereinafter.
Furthermore, as shown in FIG. 4, the pictures are not evenly distributed in time, as the enhancement layer rate control selects the position of the B pictures independently of the positioning of the P pictures, so sometimes the pictures are grouped close together in time, while at other times there could be a large time gap between pictures. This can lead to jerkiness of motion in the sequence.
The overall PSNR of all the pictures in the video sequence as a whole, and the temporal positions of all these pictures relative to each other, are important The spatial quality and placement of each B picture, alone, are not the only important factors. Hence, a problem which the inventors have appreciated and which needs to be solved is how to allocate bits to B pictures and where to position the resulting B pictures temporally, to give the best subjective video quality, given certain base and temporal enhancement layer bit rates.