The present invention relates generally to digital video signal processing, and more particularly, to video communication systems employing transcoding for rate adaptation of video bridging over heterogeneous networks such as multipoint video conferencing, remote collaboration, remote surveillance, video-on-demand, video multicast over heterogeneous networks, and streaming video.
Video telephony is an efficient way for business persons, engineers, scientists, etc., to exchange their information at remote locations. With the rapid growth of video telephony, the need for multipoint video conferencing is also growing. A multipoint video conference involves three or more conference participants. In continuous presence video conferencing, each conferee can see all of the other conferees in the same window simultaneously. In such systems, it is necessary to employ a video-bridge to combine the coded video signals from the multiple participants into a single video for display.
FIG. 1 depicts an application scenario of multiple persons participating in a multipoint videoconference with a centralized server. In this scenario, multiple conferees are connected to a central server, referred to as a Multipoint Control Unit (MCU), which coordinates and distributes video and audio streams among multiple participants in a video conference according to the channel bandwidth requirement of each conferee. A video transcoder is included in an MCU to combine the multiple incoming encoded digital video streams from the various conferees into a single coded video bit stream and send the re-encoded video bit-stream back to each participant over the same channel with the required bit rate and format, for decoding and presentation. In the case of a multipoint video conference over the PSTN (Public Switched Telephone Network), e.g. POTS (Plain Old Telephone Service) or ISDN (Integrated Service Digital Network), the channel bandwidth is symmetric. Assuming the conferees have the same channel bandwidth: B Kbps, that is, the MCU receives from the conferees, video at B Kbps each, the MCU combines the video and re-encodes the combined video at B Kbps so as to meet the channel bandwidth requirements for sending back the. video to the conferees. Therefore, it is required to perform bit-rate conversion/reduction at the video transcoder. Bit-rate conversion from high bit-rate to low bit-rate in video transcoding will, however, introduce video quality degradation. The visual quality, the computational load, and the used bit-rates need to be traded off in video transcoding to find a feasible solution.
The simplest approach for implementing the transcoder is the use of open-loop transcoding in which the incoming bit-rate is down-scaled by truncating the DCT coefficients, by performing a re-quantization process, or by selecting an arbitrary number of DCT coefficients. Since the transcoding is done in the coded domain, a simple and fast transcoder is possible. However the open-loop transcoder produces an increasing distortion caused by the xe2x80x9cdriftxe2x80x9d problem due to the mismatched reconstructed picture in the encoder and the decoder. The drift error can be eliminated by cascading a decoder and an encoder. In the cascaded transcoder, the decoder decompresses the incoming bit-stream which was encoded at a bit-rate R1, and then the encoder re-encodes the reconstructed video at a lower bit-rate R2. Although the drift error can be eliminated by using the cascaded transcoder, the computational complexity is very high; thus, direct use of a cascaded transcoder is not practical in real-time applications. The cascaded transcoder""s complexity, however, can be significantly reduced by reusing some information extracted from the incoming bit-stream, such as motion information and coding mode information.
Keesman et al. (see reference 6 below) introduced simplified pixel-domain and DCT-domain video transcoders based on a motion vector reuse approach to reduce both the computation cost and the memory cost in a cascaded transcoder; however, in such system, the visual quality is degraded due to the non-optimal motion vector resulting from reusing the incoming motion vectors.
Youn et al. (see references 14 and 15 below) proposed efficient motion vector estimation and refinement schemes which can achieve a visual quality close to that of a cascaded transcoder with full-scale motion estimation with relatively small extra computational cost. Several quantization schemes and efficient transcoding architectures have been proposed in references 6-9 below.
Each of the following background references is incorporated by reference herein.
[1] ITU-T Recommendation H.261, xe2x80x9cVideo codec for audiovisual services at pxc3x9764 kbits/s,xe2x80x9d March 1993.
[2] ITU-T Recommendation H.263, xe2x80x9cVideo coding for low bit-rate communication,xe2x80x9d May 1997.
[3] M. D. Polomski, xe2x80x9cVideo conferencing system with digital transcodingxe2x80x9d, U.S. Pat. No. 5,600,646
[4] D. G. Morrison, M. E. Nilsson, and M. Ghanbari, xe2x80x9cReduction of the bit-rate of compressed video while in its coded form,xe2x80x9d in Proc. Sixth Int. Workshop Packet Video, Portland, Oreg., September 1994.
[5] Eleftheriadis and D. Anastassiou, xe2x80x9cConstrained and general dynamic rate shaping of compressed digital video,xe2x80x9d in Proc. IEEE Int. Conf. Image Processing, Washington, D.C., October 1995.
[6] G. Keesman, et al., xe2x80x9cTranscoding of MPEG Bitstreams,xe2x80x9d Signal Processing Image Comm., vol. 8, pp. 481-500, 1996.
[7] G. Keesman, xe2x80x9cMethod and device for transcoding a sequence of coded digital signalsxe2x80x9d, U.S. Pat. No. 5,729,293
[8] H. Sun, W. Kwok, and J. W. Zdepski, xe2x80x9cArchitectures for MPEG compressed bitstream scaling,xe2x80x9d, IEEE Trans. Circuits Syst. Video Technol., vol. 6, pp. 191-199, April 1996.
[9] P. Assuncao and M. Ghanbari, xe2x80x9cA frequency-domain video transcoder for dynamic bit-rate reduction of MPEG-2 bit streams,xe2x80x9d Trans. On Circuits Syst. Video Technol., vol. 8, no. 8, pp. 953-967, 1998.
[10] M. V. Eyuboglu et al., xe2x80x9cEfficient transcoding device and methodxe2x80x9d, U.S. Pat. No. 5,537,440
[11] Y. Nakajima et al., xe2x80x9cMethod and apparatus of rate conversion for coded video dataxe2x80x9d, U.S. Pat. No. 5,657,015
[12] J. Koppelmans et al., xe2x80x9cTranscoding devicexe2x80x9d, U.S. Pat. No. 5,544,266
[13] M. -T. Sun, T. -D. Wu, and J. -N. Hwang, xe2x80x9cDynamic bit allocation in video combining for multipoint video conferencing,xe2x80x9d IEEE Trans. Circuit and Systems., vol. 45, no. 5, pp. 644-648, May 1998.
[14] J. Youn, M. -T. Sun, and C. -W. Lin xe2x80x9cMotion estimation for high-performance transcoders,xe2x80x9d IEEE Trans. Consumer Electronics, vol. 44, pp. 649-658, August 1998.
[15] J. Youn, M. -T. Sun and C. -W. Lin, xe2x80x9cAdaptive motion vector refinement for high performance transcoding,xe2x80x9d IEEE Trans. Multimedia, vol. 1, no. 1, pp. 30-40 March 1999.
[16] B. Shen, I. K. Sethi, and V. Bhaskaran, xe2x80x9cAdaptive motion-vector resampling for compressed video downscaling,xe2x80x9d IEEE Trans. Circuits Syst. Video Technol., vol. 9, no. 6, pp. 929-936, September 1999.
[17] ITU-T/SG15, xe2x80x9cVideo codec test model, TMN8,xe2x80x9d Portland, June 1997.
[18] Image Procssing Lab, University of British Columbia, xe2x80x9cH.263+ encoder/decoder,xe2x80x9d TMN(H.263) codec, February 1998.
The present invention provides a multipoint video communication system employing a transcoder with a dynamic sub-window skipping technique to enhance the visual quality of the participants of interest. The system firstly identifies the active conferees from the multiple incoming video streams by calculating the temporal and the spatial activities of the conferee sub-windows. The sub-windows of inactive participants are dropped, and the bits saved by the skipping operation are reallocated to the active sub-windows. Several motion vector composition schemes can be used to compose the unavailable motion vectors in the dropped frames due to limited bit-rates or frame-rates of the user clients in video transcoding. The present invention employs a novel pre-filtered activity-based dominant vector selection (PA-FDVS) scheme because it can provide accurate approximation of motion vectors with lowest computational cost and memory requirement. Simulation results show the visual quality of the active sub-windows is significantly improved at the cost of degrading the temporal resolution of the inactive sub-windows, which degradation is relatively invisible to human perception.
According to a first aspect of the present invention, there is provided a video communication system, comprising:
(a) transcoding means for (i) receiving multiple incoming encoded digital video signals respectively sent over plural transmission paths from a plurality of video devices, (ii) processing the received video signals and (iii) combining the processed video signals into an output video signal comprising a single coded video bit stream, respective portions of the output video signal corresponding to the video signals sent from the plurality of video devices constituting sub-windows of the output video signal; and (ii) (b) means for transmitting the output video signal through the transmission paths to the plurality of video devices, respective portions of the output video signal corresponding to the video signals sent from the plurality of video devices constituting sub-windows of the output video signal;
wherein the transcoding means comprises:
means for classifying the sub-windows into active sub-windows and static sub-windows; and
means for generating the output video signal by
(1) transcoding frames of the active sub-windows while skipping transcoding of frames of the static sub-windows and substituting a latest corresponding encoded sub-window for a skipped sub-window to approximate the skipped sub-window, and
(2) obtaining outgoing motion vectors of the output video signal from incoming motion vectors of the active sub-windows and the static sub-windows by summing motion vectors of the skipped static sub-windows and by obtaining a motion vector of a non-aligned macroblock which is not aligned with segmented macroblock boundaries in the sub-windows by a dominant vector selection operation comprising pre-filtering out unreliable neighboring motion vectors of the segmented macroblock boundaries and selecting the one of the segmented macroblock boundaries having the largest overlapping activity as the dominant block, and selecting the motion vector of the dominant block as the motion vector of the non-aligned macroblock.
The prefiltering operation may comprise determining whether a strongly overlapping dominant block exists among the segmented macroblocks. The operation of determining whether a strongly overlapping dominant block exists among the segmented macroblocks may comprise calculating the largest overlapping area of each of the segmented macroblocks with the non-aligned macroblock, and if the largest overlapping area of one of the segmented macroblocks is greater than a predetermined threshold, then selecting the motion vector of the one of the segmented macroblocks with the largest overlapping area as the dominant vector, and if the largest overlapping area is not greater than the predetermined threshold, then: setting an initial candidate list as the four neighboring motion vectors {IV1, IV2, IV3, IV4} of the four segmented macroblocks, calculating the mean and the standard deviation of the four neighboring motion vectors in accordance with the relation:             IV      mean        =                  1        4            ⁢                        ∑                      i            =            1                    4                ⁢                  xe2x80x83                ⁢                  IV          i                                IV      std        =                            1          4                ⁢                              ∑                          i              =              1                        4                    ⁢                      xe2x80x83                    ⁢                                    (                                                IV                  i                                -                                  IV                  mean                                            )                        2                              
for i=1 to 4,
if |IVixe2x88x92IVmean| greater than kstdxc2x7IVstd, removing IVi from the candidate list as unreliable, and if not, keeping IVi in the candidate list as reliable.
The largest overlapping activity determining operation may comprise, for each motion vector on the candidate list, calculating an area-activity product Aixc2x7ACTi, i=1,2,3,4, where Ai is the overlapping area with the segmented macroblock and ACTi is the activity measure, and
selecting the dominant vector as the motion vector of the one of the segmented macroblocks with the largest area-activity product.
The means for classifying may operate such that it calculates a sum for each of the sub-windows of the magnitude of its motion vectors, compares the sum with a threshold, and classifies the sub-window as active or static in accordance with a comparison result.
The means for processing may operate such that it obtains the outgoing motion vectors by, after a frame of a static sub-window is skipped, composing motion vectors of each skipped and non-skipped sub-window relative to its corresponding latest encoded sub-window.
According to another aspect of the present invention there is provided a video communication method comprising:
receiving multiple incoming encoded digital video signals respectively sent over plural transmission paths from a plurality of video devices;
processing the received video signals;
combining the processed video signals into an output video signal comprising a single coded video bit streams and
transmitting the output video signal through the transmission paths to the plurality of video devices, respective portions of the output video signal corresponding to the video signals sent from the plurality of video devices constituting sub-windows of the output video signal;
wherein the processing step comprises:
classifying the sub-windows into active sub-windows and static sub-windows; and
generating the output video signal by
(1) transcoding frames of the active sub-windows while skipping transcoding of frames of the static sub-windows and substituting a latest corresponding encoded sub-window for a skipped sub-window to approximate the skipped sub-window, and
(2) obtaining outgoing motion vectors of the output video signal from incoming motion vectors of the active sub-windows and the static sub-windows by summing motion vectors of the skipped static sub-windows and by obtaining a motion vector of a non-aligned macroblock which is not aligned with segmented macroblock boundaries in the sub-windows by a dominant vector selection operation comprising pre-filtering out unreliable neighboring motion vectors of the segmented macroblock boundaries and selecting the one of the segmented macroblock boundaries having the largest overlapping activity as the dominant block, and selecting the motion vector of the dominant block as the motion vector of the non-aligned macroblock.
The prefiltering step may comprise determining whether a strongly overlapping dominant block exists among the segmented macroblocks and the step of determining whether a strongly overlapping dominant block exists among the segmented macroblocks may comprise calculating the largest overlapping area of each of the segmented macroblocks with the non-aligned macroblock, and if the largest overlapping area of one of the segmented macroblocks is greater than a predetermined threshold, then selecting the motion vector of the one of the segmented macroblocks with the largest overlapping area as the dominant vector, and if the largest overlapping area is not greater than the predetermined threshold, then: setting an initial candidate list as the four neighboring motion vectors {IV1, IV2, IV3, IV4} of the four segmented macroblocks, calculating the mean and the standard deviation of the four neighboring motion vectors in accordance with the relation:             IV      mean        =                  1        4            ⁢                        ∑                      i            =            1                    4                ⁢                  xe2x80x83                ⁢                  IV          i                                IV      std        =                            1          4                ⁢                              ∑                          i              =              1                        4                    ⁢                      xe2x80x83                    ⁢                                    (                                                IV                  i                                -                                  IV                  mean                                            )                        2                              
for i=1 to 4,
if |IVixe2x88x92IVmean| greater than kstdxc2x7IVstd, removing IVi from the candidate list as unreliable, and if not, keeping IVi in the candidate list as reliable, and the step of determining the largest overlapping activity may comprise, for each motion vector on the candidate list, calculating an area-activity product Aixc2x7ACTi, i=1,2,3,4, where. Ai is the overlapping area with the neighboring block (i) and ACTi is the activity measure, and the method may further comprise selecting the dominant vector as the motion vector of the one of the segmented macroblocks with the largest area-activity product.
The classifying step may comprise calculating a sum for each of the sub-windows of the magnitude of its motion vectors, comparing the sum with a threshold, and classifying the sub-window as active or static in accordance with a comparison result.
The processing step may comprise obtaining the outgoing motion vectors by, after a frame of a static sub-window is skipped, composing motion vectors of each skipped and non-skipped sub-window relative to its corresponding latest encoded sub-window.