Centralized video conferencing uses a—sometimes degenerated—star topology: endpoints connect to a central switching device, often known as a “Multipoint Control Unit” or “MCU”. Traditional MCUs handle the “routing” of the incoming video streams, transcoding to an appropriate frame rate, resolution, video coding standard or other properties, audio mixing, and so on. They further handle call control issues, not only between the endpoints and the MCU, but also related to conference global control properties. As such, MCUs are not only expensive and form single point of failure, but also (due to the often required transcoding) add delay, video quality degradation, and other undesirable side effects.
Multipoint video conferences can also be established using a full mesh topology, but that has the disadvantage of requiring substantially higher networking resources for larger conferences, as well as substantially higher computational demands in case of a heterogeneous endpoint population.
Multipoint video conferencing that avoids the transcoding MCU, but still allows for the network resource savings only a star topology can offer, are known. Specifically, in one architecture, the MCU is replaced by a device, known as Scalable Video Conferencing Switch (SVCS) that manipulates the incoming compressed video bitstreams in the compressed domain before sending them to the respective endpoint. This is enabled by the use of a layered coding technology known as “Scalable Video Coding”, for which the bitstream syntax and decoding process are formally specified in ITU-T Rec. H.264 Annex G. ITU-T Rec. H.264 and its Annexes can be obtained from the International telecommunications Union, Place de Nations, 1120 Geneva, Switzerland, or www.itu.int.
A layered video bitstream, as received by the SVCS, includes a base layer, and may include one or more of each temporal, spatial, or SNR enhancement layers. All layers stand in a well-defined use relationship with each other. The SVCS can discard certain layers that it has received before sending the thinned layered bitstream on to the endpoint. Thinning can be caused by transmission errors, decoder capabilities, connectivity issues (that may be reported through RTCP receiver reports) and other factors, as described, for example, in U.S. Pat. No. 7,593,032.
While the SVCS efficiently manages the video traffic of its outgoing ports, in a traditional system setup, each endpoint sends to the SVCS the “best” video content it can produce and transmit. There are two main factors that determine what “best” means: first, the endpoints computational power and other hardware based resource issues. For example, an endpoint running on a slow laptop hardware may not be able to encode 720p60 video streams. Second, an endpoint connected over a slow (e.g. 384 kbit/s) link cannot transmit 720p60 video in useful quality, even if it were capable to do so based from a computational resources viewpoint.
A similar situation exists in traditional MCU-based systems: the capabilities and operation points of the (today: single-layer) video codecs in the endpoint are determined by the endpoint's and MCU port's capabilities, and the available bandwidth. The MCU hides these properties from the other endpoints connected to it.
This setup has advantages from an architectural viewpoint—endpoints do not need to consider the capabilities of other endpoints, of which there could be many in a single conference. However, it has also the disadvantage of unnecessarily using both CPU and network resources in the sending endpoints in many scenarios. CPU resources translate to power consumption, which is critical in mobile applications but also increasingly important for non-mobile endpoints in today's ecologically conscious world. Use of fewer network resources translates into money savings in many cases, directly (when the link has a per traffic charge characteristic) or indirectly (more available traffic for competing, non-video conference traffic results in higher productivity and/or in less demand for connectivity upgrades).
Accordingly, it is advantageous to instruct encoders in endpoints to tailor their outgoing bitstreams not only based on their and the MCU's or SVCS's capabilities, but also based on the needs of the receiving endpoint population of the conference.