1. Field of the Invention
This invention relates to systems, apparatuses, methods, and computer program products relating to establishing and maintaining high quality videoconferences between multiple nodes in which packet loss may be an issue.
2. Discussion of the Background
Video-conferencing is a ubiquitous form of the information exchange in the modern era. A video-conference includes at least two stations exchanging video, audio, and other data to support of a virtual meeting. The video is a stream of data made up of frames that include pictures and sound. Video signals are typically analog but may also be digital. The digital information may or may not be compressed.
As recognized by the present inventors, a limitation with conventional systems is that by employing data compression in the video signal, and subsequently transmitting the compressed data over a communication link, compressed data may be lost or corrupted and cause noticeable losses in video quality. Moreover, a conventional technique begins by digitizing an input video signal to a format where it is represented by a number of frames per second, and a number of pixels per frame. The digitized video signal is then compressed to reduce the required throughput demand on the communication link. The compression method may be based on dividing each frame into blocks of pixels. Each block in a conventional system is compressed in one of the two following ways:                1) As an INTRA block, i.e. the block is compressed independently of pixel values outside the current frame.        2) As an INTER block, i.e. the block is compressed by forming the residual data (difference) between the original pixel values and the corresponding pixel values in a predicted block, and then compressing the residual data. The predicted block is determined from one or more previously coded frames. The spatial displacement of a block between the current frame and a previous frame is described by a motion vector associated with that block. The benefit of using INTER blocks is the small number bits required to represent the reduced dynamic range of the difference signal. Thus, the compressed data for each INTER block contains two types of information: 1) motion vectors and 2) residual data. (This is valid for many video coding standards including H.261, H.263, and H.264.)        
To determine the predicted block, motion compensation is used. For each block in the original frame, a motion vector is used to describe the spatial displacement between the original pixel values and the corresponding pixel values in one of the previously coded frames. The predicted block is constructed from the displaced blocks in the previously coded frames. The motion vectors are determined for each block at the transmitter side by a motion estimation procedure, and transmitted to the receiver side as side information.
The compressed data corresponding the each INTRA/INTER block are then collected into transmission packets containing a certain number of bytes. The data from one video frame are normally transmitted as several packets. At the receiver side, the INTRA blocks are decoded directly. The INTER blocks are decoded by first decoding the residual, and then adding the predicted block using the corresponding motion vectors.
All communication links have some risk of generating transmission errors. In the video frame context, using an IP network for example, the transmission errors will manifest themselves as “packet losses,” as illustrated in FIG. 1. As shown, a camera 711 produces video frames 701 and applies the video frames 701 to a video encoder 713. The video encoder formats the data in the frames into packets 702, which are then transmitted over a network (such as IP network 715). In the network 715, packets are inadvertently lost or corrupted, such that received packets may contain a lost packet 703. A video decoder 717 is then faced with having to reproduce the video signal despite the fact the one or more of the packets have been lost. Consequently, the output of the video decoder 717 contains some corrupted video frames 704, which appear as imperfect video images when produced on monitor 719. Because conventional methods do not typically employ the packet retransmission techniques, the decoder is faced with having to handle lost packets in a way that minimizes the visual distortion for the end user.
In some situations, there can be two communication links separated by either an MCU or a Gateway as illustrated in FIG. 11 and FIG. 12. The main purpose of a Gateway is to reformat the compressed data between an IP network and a non-IP network, typically an ISDN network. The purpose of an MCU is to allow for several users to participate in the same conference. This implies that an MCU will have several upstream encoders and downstream decoders connected. If the upstream encoder and downstream decoder pair is connected through different networks, the MCU performs reformatting in the same way as the Gateway. However, if the picture resolution needs to be changed between the upstream encoder and the downstream decoder, the MCU also needs to perform transcoding which includes decompression of the incoming data followed scaling of the decompressed video to the new resolution, followed by re-compression of the video with the new resolution. In order to perform transcoding, the MCU will then be equipped with an internal encoder and decoder pair for each upstream/downstream connection.
If the encoder has information about a serious packet loss situation, it could modify its encoding format by adding more redundancy in the transmitted signal so as to avoid video distortion when the video images are reproduced at the destination terminal. As recognized by the present inventors, when there are no packet losses, INTER blocks are much more efficient than INTRA blocks since they normally use fewer bits. INTRA blocks are normally preferred only in case of scene changes and in areas with complex motion where a good approximation cannot be found in the previous frame. However, in the case of a packet loss during transmission, INTER blocks are particularly vulnerable. This is because lost packets in an INTER block affects not only that particular frame, but also creates a propagation of packet errors in subsequently decoded frames. Thus, the error progressively contaminates multiple frames.
Assuming that the lost data cannot be retransmitted, the only way to terminate the propagation of errors is to send an INTRA block, preferably as soon as possible after a packet loss has occurred. As illustrated in FIG. 2, INTRA blocks 807 are interleaved at predetermined intervals between series of INTER blocks 809. If a packet loss 805 occurs in one of the INTER blocks 809, the all subsequently frames 803 are corrupted until the next INTRA block is transmitted. Thus, as recognized by the present inventors, there is a trade-off between compression efficiency using mostly INTER blocks, and robustness to packet loss using mostly INTRA blocks.
On method for intelligently managing the insertion of INTER and INTRA blocks is taught by Thomas Stockhammer and Thomas Wiegand in “H26L Simulation Results for Common Test Conditions for RTP/IP over 3GPP/3GPP2”, VCEG-N38, ITU-Telecommunication Standardization Sector, STUDY GROUP 16 Question 6, Video Coding Experts Group (VCEG)”, Fourteenth Meeting: Santa Barbara, Calif., USA, 21-24 Sept. 2001, the entire contents of which are incorporated by reference. Among the problems with this method, however, is the computational complexity of the encoder required to compensate for a sustained packet loss situation.
Ideally, the transmitter should get instant feedback from the receiver and be able to react instantly by transmitting the INTRA blocks in the areas being affected by the loss. However, in many cases, the transmitter receives limited, stale and/or inaccurate information about packet losses detected at the receiver side. In particular, the transmitter might receive only delayed information about packet losses without any reference to the particular frame, and to the particular blocks. Typically, the decoder sends a notification to the encoder each time a packet loss occurs. However, in order not to overload the system, a filter may be used to limit the frequency of these notifications to a maximum of M notifications per second.
What is required to address this limitation, as recognized by the present inventors, is an apparatus, system, method, and computer program product that based on the limited, delayed, and inaccurate notifications detects that there is a serious and sustained packet loss situation that the receiver is dealing with. Further, what is required, is an encoder that, if this detection is made, can switch to a “robust mode” characterized by the intelligent insertion of INTRA blocks, instead of INTER blocks, to achieve the best possible trade-off between efficient compression and limited error propagation with minimal system complexity.
In case of a packet loss, the task of the decoder is to conceal the lost data as well as it possibly can. If a packet loss occurs, one or more blocks in that particular frame need to be replaced by some generated blocks that make the visible artifacts of the packet loss as small as possible. One such process is to replace the missing blocks by the corresponding blocks in the previous frame as described in Telenor Research, “Definition of an error concealment model (TCON)”, LBC-95-186, ITI-TS SG15 Working Party 15/1, Expert's Group For Very LowBitrate Visual Telephony, Boston (20-23) Jun. 1995, the entire contents of which are incorporated herein by reference. This process is illustrated in FIG. 3, where lines 3001 indicate block data from a previous frame without motion vector information is used to estimate a lost block. Because this approach considers block data without consideration of motion vector information, this approach works reasonably well only when there is a high degree of similarity between consecutive frames (i.e., no motion or almost no motion). However, this approach does not work well when there is a low degree of similarity between consecutive frames (i.e., a large amount of motion.) More advanced conventional systems, therefore, have been developed to take into account motion vector info.
As shown in FIG. 4, concealment based on blocks from the same position in the previous frame can be further enhanced by combining the motion vector from the lost block with video (e.g., pixel) data from the previous block. This is possible in situations where the block data is lost but the motion vector data is not lost. Thus, concealment of a lost block in block position (k,l) in frame n is performed using the lost block's own motion vector, Vn(k,l).
However, typically a motion vector associated with a block is lost with the rest of the data for that block. Thus, the challenge is to find a good estimate V′n(k,l) of the lost motion vector for the lost block, particularly when there is a low degree of similarity between consecutive frames. One method for estimating the motion vectors of lost blocks is taught by Stephan Wenger and Michael Horowitz in “Scattered Slices: A New Error Resilience Tool for H.26L”, JVT-B027, Joint Video Team (JVT) of ISO/IEC MPEG and ITU-T VCEG (ISO/IEC JTC1/SC29/WG11 and ITU-T SG16 Q.6) 2nd Meeting, Geneva, CH, Jan. 29-Feb. 1, 2002, the entire contents of which are incorporated by reference. In this method, the motion vector of the lost block is estimated from motion vectors of neighbor blocks within the same frame as the lost block with the following equation:V′n(k,l)=ƒ(Vn(k−1,l−1),Vn(k−1,l),Vn(k−1,l+1),Vn(k,l−1),Vn(k,l+1),Vn(k+1,l−1),Vn(k+1,l),Vn(k+1,l+1))where Vn(i,j) is the motion vector for block position (i,j) in frame n, and ƒ( ) is some function. This situation is illustrated in FIG. 5. However, a problem with this method is that the method assumes that neighbor blocks are not lost, which is not always true.
What is required, as discovered by the inventors, is an improved method of estimating the contents of a lost block by finding an estimate of the motion vector for the lost block that does not rely upon the availability of neighbor block motion vector information.