1. Field of the Invention
This invention relates to the transmission of compressed video signals and, more particularly, the optimal determination of a coded bit rate in a system where more than one coded video stream is multiplexed over a single, bandwidth limited data link.
2. Background of the Invention
Methods and systems that try to achieve the best possible perceived quality of a reconstructed video image under real-time constraints are known as Rate Control algorithms. Traditional Rate Control algorithms operate in a single video encoding process and optimize only with a single coded video stream. In contrast, this invention is concerned with algorithms that optimize multiple coded video streams simultaneously. This class of algorithms is called Multichannel Rate Control algorithms (MCRC-algorithms).
Two major classes of single channel rate control algorithms are also known. A first class is denoted as constant bit rate control algorithm (CBR). CBR algorithms try to assign a well-defined number of bits to each captured image during an encoding process. The second class is known as variable bit rate control algorithms (VBR). VBR algorithms keep the bit rate on average and within a well-defined variance, constant for several pictures. However, they allow for sometimes significant changes of the rate per individual pictures. This allows an encoder to react to big changes in the image characteristic by spending more bits for this change than for smaller changes, and, hence, often leads to a better perceived picture quality. A very early example of a rate control algorithm can be found in the publication Huang, Schultheiss, “Block Quantization of Correlated Gaussian Random Variables,” IEE Transactions of Comm. Systems, vol. 3, pp 26-40, 1963). More recent examples of CBR are described in W. Ding and B. Liu, “Rate Control Of MPEG Video Coding And Recording By Rate-Quantization Modeling,” IEEE Trans. Circuits and Systems for Video Tech. 6(1) (February 1996) pp. 12-20 and VBR are described in ISO-IEC/JTC1/SC29/WG11, MPEG2 Test model 5 Draft (April 1993).
CBR algorithms are generally preferred for low-delay operations over fixed bandwidth links. The common way to implement CBR algorithms in hybrid encoders is to adjust the quantization step size, commonly known as QP value. This numeral value directly influences the compression factor by removing precision in the rounding of the transform coefficients during the compression process. In most video compression systems (at least in those conforming to one of the popular video compression standards), the QP value is a property of a macroblock and typically has an integer numerical value within a small numeric numbering range, such as one to thirty-two. In this regard, the higher the QP value, the lower the quality of the picture, while the lower the QP value the higher the quality of the picture. Thus, the QP value is generally inversely related to the picture quality.
It is well known that the human visual system reacts unfavorably to abrupt changes in the picture quality and such changes are perceived as very annoying. Hence, most rate control algorithms try to employ an equal QP value for the whole picture, or allow only for slight variations of the QP value, thereby leveling the picture quality, and, hence, prevents abrupt quality changes. More sophisticated rate control algorithms sometimes take psycho-optical considerations into account and distinguish between “flat” and “active” sectors of the picture. They then attempt to code flat sectors in a lower quality than active sectors. A typical example of such an algorithm can be found in the European Patent Reference EP 1 250 012 A2.
Another technique that is somewhat related to the invention is known as load balancing. In general, load balancing techniques try to allocate multiple requests to multiple servers in such a way that the response time to the request is optimized. They are most commonly used in data transmission environments, for example, to distribute the load of the request to a popular website to a multitude of web servers. Load balancing algorithms commonly use linear optimization to optimize the transmission of data among a plurality of web servers, but these linear optimization techniques did not provide rate control for a plurality of streams from a video conference environment and/or environments.
FIG. 11 depicts a typical prior art, four screen videoconference system and environment known as a TeleSuite® room maintained by TeleSuite Corporation of Englewood, Ohio, and of the type shown and described in U.S. Pat. Nos. 5,572,248, 5,751,337, 6,160,573, and 6,445,405 which are incorporated herein by reference and made a part hereof.
A wide-band scene A, with an aspect ratio of 16:3, consists of four spatially adjacent sub-scenes A1, A2, A3, and A4. Many prior systems utilize video compression algorithms that generally conform to one of the generally accepted video compression standards, such as International Telecom Union (ITU) standards H.261 or H.263. For example, the H-261 standard was designed for data rates which are multiples of 64 kilobytes/second. Such standards often have established data rates and preferred picture formats, and although they may also support other formats, the widely deployed encoders/decoders (codecs) support only those standard formats. Hence, it is necessary to combine several cameras and several codecs to capture a wide-band scene and encode the wide-band scene by splitting it spatially into several sub-scenes, each of which with the size of one of the commonly supported picture formats of the video codecs.
Referring to FIG. 11, note that each sub-scene is captured by the associated camera, C1, C2, C3, and C4. The sub-scenes in the depicted example are described as follows: A1 shows a single sitting person's upper body, A2 shows two sitting persons' upper bodies, A3 shows two sitting persons' upper bodies, one of which is in the process of getting up and gesticulating, and A4 shows a static background.
The video outputs of the cameras C1 to C4, each carrying the analog representation of the sub-scenes A1 to A4, are converted by the video encoders E1 to E4 into compressed, digital video bit streams B1 to B4, respectively, preferably conforming to one of the ITU video compression standards, such as H.261 or H263. In one environment, all encoders E1 to E4 (labeled F-1 in FIG. 11) are configured to utilize the same bit rate, namely, 10 kbit/s in the example depicted in FIG. 11. Hence, the resulting bit rate used in transmission over a local or wide area network (WAN) is 4×10 kbit/s=40 kbit/s. Since the sub-scenes vary in their activity, but the encoder bit rates are constant, the quality of the coded sub-scenes, as indicated by the QP value also varies. Encoder E1, which encodes a moderately active sub-scene, operates at a good quality level with a QP value of 10. Encoder E2, with a slightly more active sub-scene than E1, cannot achieve the same quality within the bit rate constraints and operates at a QP value of 12. Encoder E3, coding the extremely active sub-scene A3, operates at a QP value of 30 and produces a coded image of very low quality. Encoder E4, which codes the static background sub-scene A4, operates at the best possible quality level with a QP of 1.
All streams are multiplexed together in a multiplex unit J to form an outgoing data stream. The data stream is conveyed over a local or wide area network (WAN) K to the receiving room. Here, the received multiplexed data stream is de-multiplexed by a demultiplexer L to reconstruct the original four compressed, digital video bit streams. The bit streams are conveyed to the decoders D1 to D4, each of which reconstructs a video sub-image. These sub-images are made visible using the attached displays or data projectors P1 to P4. The projector beam directions of all projectors P1 to P4 are arranged in such a way that the four displayed sub-images I1 to I4 spatially compose a full image I that geometrically resembles the captured scene A.
Each encoder E1 to E4 has a set of defined and fixed parameters and generates a bit stream in compliance with these parameters. The most prominent of these parameters is the target bit rate. Typically, each encoder E1 to E4 uses a CBR algorithm to achieve the best possible picture quality when coding the captured scene. When using multiple encoders, each encoder operates at a certain predetermined bit rate. Normally, all encoders are configured to use the same bit rate, because at the configuration time the characteristics of the sub-scenes to be captured are not yet known.
Since the bit rate for each sub-scene is fixed, the quality level of the coded sub-picture varies with the activity captured by the camera. A static background, for example the one of the sub-scene A4, is coded at a very high quality in order to utilize the configured bit rate. A highly active sub-scene, for example the one of the sub-scene A3, yields an unfavorably low picture quality. After transmission and reconstruction, the complete broad-band image I suffers not only from an unpleasantly low quality sub-image I3, but also from an annoying quality change between the sub-images I3 and I4.
When displaying a wide-band image comprised of a plurality of sub-images displayed side-by-side, it is desirable to have the displayed images be the same quality so they do not annoy the human visual system through abrupt quality changes. However, when a multitude of images are transmitted and displayed in a room, and if all transmitted images use the same transmission bandwidth (as is common in the prior art), it is not uncommon that one or more of the displayed images will be coded at a different quality level compared to the neighboring sub-picture (by using average QP values that are different).
What is needed, therefore, is a system and method which adjusts the image quality across a plurality of sub-pictures simultaneously and in real-time in order to achieve a high perceived image quality across the entire composite or broadband image comprised of the multiple sub-images for those viewing the plurality of sub-images that make up the image.
What is further needed is a system and method which adjusts the picture quality for each of a plurality of images that comprise a picture and that distributes or balances the transmission of the plurality of images in order to optimize the overall picture quality in a video transmission system.