This invention relates to video coding.
One of the recent targets in mobile telecommunications has been to increase the speed of the data transmission in order to enable multimedia services via radio networks. One of the key components of multimedia is digital video. Digital video offers a great many advantages over traditional analogue systems, supporting services such as video telephony and multi-media applications. However, a key problem of digital video when compared with analogue systems is the demand it places on communications and storage resources. For example, a bandwidth of approximately 160 Mbps is required in order to transmit broadcast quality video, which compares with a bandwidth of approximately 5 MHz for comparable quality analogue video. Thus, to be able to use digital video the digital signal requires reduction of the quantity of data.
Transmission of video comprises a continuous traffic of data representing moving pictures. As is generally known, the amount of data needed to transfer pictures is high compared to many other types of media, and so far usage of video in low bit-rate terminals has been negligible. However, significant progress has been achieved in the area of low bit-rate video compression. Acceptable video quality can be obtained at bit-rates around 20 kilo bits per second.
As a result of this progressive reduction in bit-rate, it is expected that video is shortly going to become a viable service to offer limited bandwidth networks such as public switched telephone networks (PSTNs) and mobile telecommunications networks. In videophone applications using fixed networks, errors are typically overcome by re-transmitting data. However mobile telephony is prone to higher error rates than the PSTN and has longer round-trip delays. These longer delays make it impracticable to use retransmission with real-time mobile videophone applications. Retransmission is also ineffective in high error rate situations.
A video sequence consists of a series of still images or frames. Data reduction is achieved by using compression techniques to remove redundant data while still retaining sufficient information to allow the original image to be reproduced with an acceptable quality. There are two main types of redundancy in video signals: spatial and temporal. For the coding of images, techniques which exploit spatial redundancy only are termed intra-frame or I frames (i.e. they treat each frame separately), while those which exploit temporal redundancy are termed inter-frame or P frames (i.e. they exploit similarities between frames). The latter invariably also exploit spatial redundancy e.g. by generating motion compensation data which describes the motion (i.e. displacement) between similar areas of the current and a previous image. In the inter frame case, the predicted (motion-compensated) image is rarely precise enough and therefore a spatially compressed prediction error image is also associated with each inter frame.
However, sufficient compression cannot usually be achieved by just reducing the redundancy of the sequence. Thus, video encoders try to reduce the quality of those parts of the video sequence which are subjectively the least important. In addition, the redundancy of the encoded bitstream is reduced by means of efficient lossless coding of compression parameters and coefficients. The main technique is to use variable length codes in which each value is coded using a unique codeword. The shortest codewords are allocated to those values, which statistically occur most often.
Several video coding techniques have been developed. These include run length coding, conditional replenishment, transform coding, Huffman coding and differential phase code modulation (DPCM). Many of these are utilised in key standards such as ITU-T Recommendations JPEG, MPEG-1 and MPEG-2, and H.261/H.263. JPEG defines the form of compressed data streams for still images; MPEG/MPEG2 are for compression of moving pictures; H.261/H.263 have primarily been defined for video telephony applications employing low bit rate communications links (of the order of tens of kbit/s). Current video telephony systems have primarily been designed for use in PSTN or packet networks, and are governed by ITU-T recommendations H.324, which covers low bit rate multimedia communication, H.245 which covers transmission protocols, H.233 which relates to multiplexing and H.323, which covers video conferencing over traditional shared media local area networks. The first mobile videophones will be based on H.324.
Compressed video is easily corrupted by transmission errors, mainly for two reasons. Firstly, due to utilisation of temporal predictive differential coding (inter frames), an error is propagated both spatially and temporally. In practice, this means that once an error occurs, it is easily visible to the human eye for a relatively long time. Especially susceptible are transmissions at low bit-rates where there are only a few intra-coded frames (the transmission of intra-coded frames would stop the temporal error propagation). Secondly, the use of variable length codes increases the susceptibility to errors. When a bit error alters the codeword, the decoder will lose codeword synchronisation and also decode subsequent error-free codewords (comprising several bits) incorrectly until the next synchronisation (or start) code. A synchronisation code is a bit pattern which cannot be generated from any legal combination of other codewords and such codes are added to the bit stream at intervals to enable re-synchronisation.
Every bit in a compressed video bitstream does not have an equal importance to the decompressed images. Some bits belong to segments defining vital information such as picture type (e.g. intra or inter), quantiser value and optional coding modes that have been used. In H.263, the most vital information is gathered in the picture header. A transmission error in the picture header typically causes a total misinterpretation of the subsequent bits defining the picture content. Due to utilisation of temporal predictive differential coding (inter frames), the error is propagated both spatially and temporally. Thus, when a decoder detects a corrupted picture header, a typical approach is to freeze the previous picture on the screen, to send an intra picture request to the transmitting terminal and to wait for the requested intra frame. This causes an annoying pause in the received video.
Transmission errors have a different nature depending on the underlying network. In packet-switched networks, transmission errors are typically packet losses (due to congestion in network elements). In circuit-switched networks, transmission errors are typically bit errors where a xe2x80x981xe2x80x99 is corrupted to xe2x80x980xe2x80x99 or vice versa and, in radio communications networks, errors may occur in bursts making the situation even more difficult.
To impede degradations in images introduced by transmission errors, retransmissions can be used (as described above), error detection (e.g. Cyclic Redundancy Checking (CRC)) and/or error correction methods can be applied, and/or effects from the received corrupted data can be concealed. In fixed networks retransmission provides a reasonable way to protect video data streams from errors since Bit Error Rates (BER) are typically in the region of 10xe2x88x926. However large round-trip delays associated with low bit-rate radio transmission and moderate or high error rates (e.g. 10xe2x88x924 to 10xe2x88x923) make it impracticable to use retransmission, especially with real-time radio videophone applications. Error detection and correction methods usually require large overheads in terms of the data that must be transmitted and memory/processing capability required. Consequently, for low bit-rate applications, error concealment may be considered the preferred way to protect and recover images from transmission errors. Video error concealment methods are typically applicable to transmission errors occurring through packet loss and bit corruption.
H.263 is an ITU-T recommendation for video coding for low bit-rate communication, which generally means data rates below 64 kbps. The recommendation specifies the bitstream syntax and the decoding of the bitstream. Currently, there are two versions of H.263. Version 1 consists of the core algorithm and four optional coding modes. H.263 version 2 is an extension of version 1 providing twelve new negotiable coding modes.
Pictures are coded as luminance (Y) and two colour difference (chrominance) components (CB and CR). The chrominance pictures are sampled at half the resolution of the luminance picture along both co-ordinate axes. Picture data is coded on a block-by-block basis, each block representing 8xc3x978 pixels of luminance or chrominance.
Each coded picture, as well as the corresponding coded bitstream, is arranged in a hierarchical structure with four layers, which are from bottom to top: block layer, macroblock layer, picture segment layer and picture layer. The picture segment layer can either be arranged as a group of blocks (GOB) or a slice.
Block layer data consists of uniformly quantised discrete cosine transform coefficients, which are scanned in zigzag order, processed with a run-length encoder and coded with variable length codes.
Each macroblock relates to 16xc3x9716 pixels of luminance and the spatially corresponding 8xc3x978 pixels of chrominance components. In other words, a macroblock consists of four 8xc3x978 luminance blocks and the two spatially corresponding 8xc3x978 colour difference blocks. Each inter macroblock is associated with motion compensation data comprising a motion vector which defines the position of a corresponding area in the reference frame which most closely resembles the pixels of the inter macroblock. The inter macroblock data comprises coded prediction error data for the pixels of the macroblock.
Usually, each picture is divided into groups of blocks (GOBs). A group of blocks (GOB) typically comprises a row of macroblocks. Data for each GOB consists of an optional GOB header followed by data for the macroblocks within the GOB.
If the optional slice structured mode is used, each picture is divided into slices instead of GOBs. A slice contains a number of consecutive macroblocks in scan-order. Data for each slice consists of a slice header followed by data for the macroblocks of the slice.
The picture layer data contain parameters affecting the whole picture area and the decoding of the picture data. The coded parameter data is arranged in a so-called picture header.
Picture and GOB (or slice) headers begin with a synchronisation code. No other code word or a legal combination of code words can form the same bit pattern as the synchronisation codes. Thus, the synchronisation codes can be used for bitstream error detection and for resynchronisation after bit errors. The more synchronisation codes that are added to the bitstream, the more error-robust the system becomes.
Most known error concealment techniques are based on spatial and temporal interpolation schemes. Spatial interpolation is used in intra frames and intra-coded areas of inter frames. Spatial interpolation means that lost areas are interpolated from spatially neighbouring areas. This can be done for example using the distance weighted average of the boundary pixels.
Error concealment using temporal interpolation is more often used in low bit-rate video coding, since the number of intra frames is usually rather low. A very basic temporal interpolation scheme copies the lost areas from the same positions of the previous frame, i.e., it treats the lost blocks as xe2x80x9cnot codedxe2x80x9d blocks. In more advanced schemes, motion compensation is performed using either the median or average of the motion vectors of spatially neighbouring blocks. There have also been some proposals to use boundary pixel matching to find best motion vectors for the lost block.
In low resolutions and at low bit-rates, the correlation between spatially neighbouring blocks is often rather low. Thus interpolated motion vectors based on spatially neighbouring pixel data may be far from the original values. This means that uni-directional concealment schemes often fail to reconstruct the original blocks. Furthermore, if only motion vectors are used for concealment without trying to recover the prediction error blocks, the picture becomes blurred, since a great amount of detail will be lost. In practice, using current concealment schemes, errors or incorrectly concealed blocks are visible for a relatively long time.
The term codec refers to the ability to both encode and decode. Video coding parameters of the algorithm controlling encoding in the video codec are normally pre-selected on the basis of the environment in which they are designed to operate. This is particularly beneficial for transmission over channels, which are prone to error. In such conditions the coding parameters are selected so as to attempt to minimise the affect of transmission errors on the picture quality. Where errors occur in transmission, the resulting decoded video normally produces additional blockiness, annoying green and pink squares, temporal jerkiness and sometimes chequered patterns.
In existing systems, two parameters which are typically pre-set in the encoding algorithm are the amount of intra-refresh information and frequency of start codes. In PSTN networks, the video codec starts the coding with a full intra-frame. Intra-frame pictures are coded without reference to other pictures which means that they contain all the information necessary for their reconstruction by the decoder and for this reason they are an essential entry point for access to a video sequence. Because the information content of intra-frames is high, the compression rate is relatively low and therefore a full intra-frame requires a significant number of data bits to define the picture. As a result, the transmission of a full intra-frame via low bandwidth channels, even using small buffers to minimise delays, takes large periods of time. This usually results in the decoder freezing the previous picture on the screen for a while, in effect to allow the following picture to catch up. Thus, as an alternative approach, in succeeding pictures portions of the picture are updated (or refreshed) in intra-mode, rather than updating the whole picture at once. Hence the picture is said to be intra-refreshed. Typically this occurs on a macroblock-by-macroblock basis of 16xc3x9716 pixels. If the rates at which the macroblocks are refreshed is slow, transmission error artefacts on the image can be perceived for a long time and will vanish only when the erroneous macroblock is intra-refreshed. In error prone networks, it is therefore necessary to increase the number of intra-refresh macroblocks in each frame, or the rate at which full intra frames are sent.
Another technique used to minimise the impact of transmission errors is to reduce the size of affected areas. Since the coded bit stream contains variable length coding (VLC) code words, an error in the bit stream in most cases causes the decoder to lose synchronisation with VLC code words. The decoder can only continue decoding correctly after receiving a start code. Typically, start codes are found at the beginning of coded picture frames; however most video coding standards also allow start codes to be inserted elsewhere in a picture, for instance at the beginning of each row of macro blocks or even more often. Thus, to reduce the size of the areas affected by transmission errors, start codes can be introduced in the picture at more frequent locations. The density of these start codes is a compromise between reduced picture quality owing to an increased number of start codes, and the size of the area which is affected by transmission errors. In error prone environments it is advantageous to sacrifice some visual image quality in order to reduce the image area affected by transmission areas.
The overall current approach is to pre-program the intra-refresh information and start code parameters into the algorithm controlling the video encoder depending on the anticipated level of transmission errors. Intra-refresh data and start codes are reasonably effective for mitigating the effects of predictable transmissions errors, but these approaches have certain shortcomings. Principally, these shortcomings stem from the fact that actual transmission errors are not always predictable, and in situations where there is a wide margin between the predicted transmission error and the actual transmission error, the intra-refresh and start code parameters will not be consistent with the required level for these encoding parameters. For example, if the transmission errors are less than anticipated then the level of intra-refresh data or start code repetition will be in excess of that required, and the excess will thus be redundant. On the other hand, if the transmission errors are much worse than those predicted, then the intra-refresh and start code information will be insufficient, and spread so widely both temporally and spatially in the decoded pictures that the result will be poor image quality.
In known H.324 multimedia terminals, it is possible to send various commands from a receiving decoder to a transmitting encoder via the H.245 control protocol. One command is videoFastUpdatePicture which the decoder requests the encoder to enter the fast update mode at its earliest convenience and update the entire picture. The protocol does not define how the encoder is to carry out the updating. The protocol also defines the command videoFastUpdateGOB which requests the encoder to enter the fast update mode at its earliest convenience and update the entire GOB; and videoFastUpdateMB which requests the encoder to enter the fast update mode at its earliest convenience and update the entire Macroblock. Thus the decoder can identify the picture, GOB or macroblock to be updated.
In a basic implementation, a multimedia terminal that conforms to H.324 will send an videoFastUpdatePicture command every time and error is detected. Similarly a videoFastUpdateGOB command is sent for every corrupted GOB.
According to H.245 the requesting terminal requires an acknowledgement from the encoder for each message sent and, if the acknowledgement is not sent in time, the decoder will transmit the message again. This means that the control channel can become very congested with repeated commands.
In H.263 Appendix 1 a feedback mechanism called Error Tracking is introduced in which the H.245 protocol is used to transmit indications as to which macroblocks of a frame are received corrupted. The video encoder keeps track of prediction errors in the past Z frames. Based on the H.245 indications and the prediction error counters, the video encoder can decided which macroblocks it will update in intra-mode.
A third known feedback video transmission system is defined in H.263 Annex N. This is known as Reference Picture Selection mode. This defines ways for a video decoder to indicate which picture segments (GOB or slices) were corrupted and, based on this information, the video encoder can encode the corrupted parts using older frames instead of the latest one. The decoder sends a negative acknowledgement (NACK) for every corrupted picture segment. Thus, if a whole picture is corrupted, a NACK for every segment will be corrupted. Alternatively the decoder can send positive feedback i.e. which segments have been received uncorrupted and the encoder can use only those uncorrupted areas for prediction.
According to a first aspect of the invention there is provided a method of video decoding comprising receiving encoded video data by a video decoder, decoding said video data to form decoded video data representing successive pictures of a video sequence, determining if the decoded video data contains an error and, when it is determined that an error is present, sending a message to a transmitting video encoder requesting an update of at least the portion of the video data containing the error wherein said update message is only sent if a pre-determined period has elapsed since a previous update message for a corresponding portion of the video data was sent.
The update message may request the update of an entire picture of the video sequence. Additionally or alternatively, the update message may request the update of a segment of a picture of the video sequence.
Preferably the predetermined period is proportional to the round-trip delay between the video decoder and the video encoder. Advantageously, the pre-determined period is twice the round-trip delay.
According to a second aspect of the invention, a step of determining if the video contains an error comprises determining the amount of change in the video data as compared with previous picture data for a corresponding area of a video image.
According to a third aspect of the invention a step of determining if the video contains an error comprises determining the amount of motion in a previous picture exceeds a pre-determined threshold.
According to a fourth aspect of the invention, a method of video decoding comprises receiving encoded video data by a video decoder, decoding said video data to form decoded video data representing successive pictures of a video sequence, determining if the decoded video data contains an error and, when it is determined that an error is present, sending a message to a transmitting video encoder requesting an update of at least the portion of the video data containing the error wherein the step of determining if the decoded video contains an error comprises determining for an area of a first picture the amount of change in the video data as compared with video data for a corresponding area of a previous picture, the update message being sent if the amount of change exceeds a pre-determined threshold.
According to a further aspect of the invention, a method of video decoding comprises receiving encoded video data by a video decoder, decoding said video data to form decoded video data representing successive pictures of a video sequence, determining if the decoded video data contains an error and, if so, concealing an area of the picture containing the error, and, when it is determined that an error is present, generating a message to a transmitting video encoder requesting an update of at least the portion of the video data containing the error wherein the update message is sent if the number of areas of the picture that have been concealed is less than a pre-determined threshold.
According to a further aspect of the invention, a method of video decoding comprises receiving encoded video data by a video decoder, decoding said video data to form decoded video data representing successive pictures of a video sequence, determining if the decoded video data contains an error and, if so, concealing an area of the picture containing the error, and, when it is determined that an error is present such that a picture cannot be decoded, all the parts of the picture are labelled as being concealed and the next picture is decoded.
The invention also relate to a method of encoding a video signal comprising receiving a video signal to be encoded; encoding the video signal to form encoded video data; and transmitting the encoded video data to a remote video decoder, wherein the encoding is responsive to an update control signal received from the remote video decoder to update requested encoded video data in a progressive manner over a plurality of pictures.
Preferably the updating is carried out on a macroblock-by-macroblock basis, the updated macroblocks being updated over sequential pictures of the video signal.
According to a further aspect of the invention, video decoding apparatus comprises means for receiving encoded video data; means for decoding said video data to form decoded video data means for determining if the decoded video data contains an error and means for sending a message to a transmitting video encoder requesting an update of at least the portion of the video data containing the error, wherein the apparatus is arranged to send the update message only if a pre-determined period has elapsed since a previous update message was sent for a corresponding portion of the video data.
Preferably the means for determining if the decoded video contains an error comprises means for determining the amount of change in the video data as compared with previous picture data for a corresponding area of a video image, the update message being sent if the amount of change exceeds a pre-determined threshold.
Advantageously an update message is generated if the amount of motion in a previous picture exceeds a pre-determined threshold.
According to a further aspect of the invention, video signal encoding apparatus comprises means for receiving a video signal to be encoded; means for encoding the video signal to form encoded video data; and means for transmitting the encoded video data to a remote video decoder, wherein the encoding means is responsive to an update control signal received from the remote video decoder to update requested encoded video data in a progressive manner over a plurality of pictures.
Preferably the updating is carried out on a macroblock-by-macroblock basis, the updated macroblocks being updated over sequential pictures of the video signal.
The invention also extends to a mobile radio device.