1. Field of the Invention
The present invention relates generally to video communication and more particularly to a method of detecting and concealing errors in a video bitstream. The invention is described in the context of, and has particular applicability to, videoconferencing, although the concepts herein are generally applicable to any digitally encoded video stream.
2. Description of Related Art
Digitization of video images has become increasingly important. In addition to their use in global communication (e.g., videoconferencing), digitization of video images for digital video recording has also become increasingly common. In each of these applications, video information is transmitted across telecommunication links such as telephone lines, ISDN, DSL, and radio frequencies, or stored on various media devices such as DVDs and SVCDs. In many cases, the transmission and/or storage is susceptible to introducing errors into the video bitstream.
Efficient transmission, reception, and/or storage of video data typically requires encoding and compressing the video data. Several approaches and standards to encoding and compressing source video signals exist. Some standards are designed for a particular application, such as ITU-T Recommendations H.261, H.263, and H.264, which are used extensively in video conferencing applications. Additionally, standards promulgated by the Motion Picture Experts' Group (MPEG-1, MPEG-2, and MPEG-4) have found widespread application in consumer electronics and other applications. Each of these standards is incorporated by reference in its entirety.
In any case, a digital image is comprised of a rectangular array of individual pixels. Typically, the whole image is not processed at one time, but is divided into blocks that are individually processed. Each block comprises a rectangular grid of a predetermined number of luminance or luma pixels (which generally specify the brightness of a pixel) and a predetermined number of chrominance or chroma pixels (which generally specify the color of a pixel). A predetermined number of blocks are combined into a macroblock, which forms the basic unit of processing in most compression methods. Although some aspects of this hierarchy of processing units are discussed below, methods and techniques for block-based processing of images for processing are generally known to those skilled in the art, and thus are not repeated here in detail.
The macroblocks of image data may be encoded in a variation of one of two basic techniques. For example, “intra” coding may be used, in which the original macroblock is encoded without reference to historical data, such as a corresponding macroblock from a previous frame. Alternatively, “inter” coding may be used, in which the macroblock of image data is encoded in terms of the differences between the macroblock and a reference macroblock of data, such as a corresponding macroblock from a previous frame. Many variations on these two basic schemes are known to those skilled in the art, and thus are not discussed here in detail. It is generally desirable to select the encoding technique which requires the fewest number of bits to describe the macroblock of data. Intra coding typically requires many more bits to represent the block, and therefore inter coding is generally preferred.
Videoconferencing “calls” are typically placed using one of two technologies. Traditionally circuit switched networks (e.g., ISDN telephone lines) have been used. Typically these calls are placed according to International Telecommunications Union (ITU) Recommendation H.320, “Narrow-Band Visual Telephone Systems And Terminal Equipment.” More recently, packet switched networks such as the Internet have become more widely used for videoconferencing. A variety of packet switched multimedia communication protocols exist, one example of which is ITU Recommendation H.323, “Packet-based Multimedia Communications Systems.” Each of these recommendations is hereby incorporated by reference in its entirety. Although the description herein is in the context of one of these two protocols, it is noted that the invention is not limited to only these protocols.
Frequently a video bitstream is transported across both network technologies in tandem. For instance, H.320/H.323 gateways can be used to allow video calls to span two local area networks. Another common example is a multipoint call, where some connections to the MCU might use H.323, while others use H.320. A video transcoder is optionally deployed in the H.323/H.320 protocol conversion. However, video transcoders are expensive, and add delay to the call. So it is advantageous to convert the protocol without such transcoding.
Video calls transmitted using either technology are in many cases subject to errors in transmission. When packet switched networks are used for transmission, a transmission error results in one or more lost packets. Packet switched network protocols (such as RTP) allow the receiver to detect that one or more packets have been lost. When circuit switched networks such as ISDN are used for transmission of H.320, a transmission error results in one or more error bits in the bitstream. BCH codes are used to provide some error correction, however error bursts frequently have too many errors to be corrected.
Frequently the transmission error causes the decoder to lose synchronization. When this occurs, bits immediately following the error (though received correctly) must be discarded until synchronization is re-established. For instance, if a packet is lost with an H.263 video bitstream packetized using RFC 2429, subsequent received packets are discarded until a GOB (group of blocks) header or a PSC (picture start code) is found. Similarly, when an H.263 video is transmitted on an H.320 (circuit switched) connection, bits received after the transmission error and before the next GOB header or PSC are discarded.
Traditionally, when the decoder detects bit stream errors, it has two options. The first option is to freeze the display and request a fast update from the transmitter. The transmitter sends an intra frame upon such a request. The receiver's display remains frozen until the intra frame is received (or until a timeout period expires). These seconds of frozen video compromise the user experience. The second option is to request a fast update but continue displaying the frames that had errors. Until the requested intra frame arrives, there are artifacts like bright color blocks, black blocks, or scrambled images, etc. Such artifacts are typically more disruptive to the user experience than a frozen display, so common practice in the videoconferencing arts has been to hide the errors by choosing option one, i.e., freezing the display.
However, neither of the two options recited above is desirable. Therefore, what is needed in the art is an alternative technique of concealing errors in a video transmission that is less disruptive of the user experience. A preferred technique would be for the decoder to reconstruct the data that is missing or corrupted by the transmission losses (e.g., packet loss). To facilitate this, it is helpful if certain information either be redundantly transmitted or that the blocks be arranged so that missing data can be reconstructed by interpolation. The present invention is directed to such a system. Although described in terms of videoconferencing systems, the concepts described herein are equally adaptable to any video coding and decoding application.