A traditional architecture of a video conferencing system has a transcoder-based multi-point control unit (MCU) that communicates with a plurality of endpoints (also called clients). Each sending client sends out one stream to the MCU. The MCU receives the incoming streams from the clients, decodes them, re-composites the video, and encodes a new video stream for each receiving client. The cost of an MCU is relatively high and the end user (at an endpoint) can only receive what the MCU composes for the end user.
Recently, switch-based conferencing systems with scalable video coding have become the trend. In such a system, a client sends to a server a video bitstream with embedded scalability. The switch need only forward the bitstream or part of the bitstream to each receiving client according to the client's requested service level. The cost of the server is significantly reduced and the end user has the freedom to choose the video layout at his/her endpoint. However, the network conditions can fluctuate in both uplink direction (endpoint to server) and downlink direction (from the server to the endpoint). Unlike the transcoder-based solution where the MCU can adapt to the network conditions when transcoding, the server can only rely on the existing scalable bitstream for adaptation.