Video conferencing and the associated hardware, falls broadly into two camps. In the first camp, “conferencing” occurs between only two participants and the participants are connected directly to one another through some form of data network. In this form of network, only two endpoints are involved and true conferencing only occurs if multiple participants are present at one of the two endpoint sites. Examples of this type of conferencing are, at the low technology end, PC enabled endpoints interconnecting using software such as NetMeeting® or Skype® and at the higher end equipment using dedicated endpoint hardware interconnected, for example, via ISDN or IP (Internet Protocol) links.
In the second camp, video conferencing allows more than two endpoints to interact with one another. This is achieved by providing at least one centralized coordinating point; a so-called “multipoint control unit (MCU)”, which receives video and audio streams from the endpoints, combines these in a desired way and re-transmits the combined composite video/audio stream to the participants. Often the conference view transmitted to the endpoints is the same for each endpoint. The composition may change over time but is the same for all the participants.
The provision of only a single composition is a significant problem because each participant must therefore receive a conference stream tailored so that it is acceptable to the least capable endpoint in the conference. In this situation therefore many endpoints are not used to their full capacity and may experience degraded images and audio as a result.
More recently, modern MCUs have been designed to allow a unique view to be created for each participant. This allows the full capabilities of each endpoint to be utilized and also allows different compositions for different participants so that, for example, the emphasis of a particular participant in the conference may be different for a different user. However, the processing of video data in real time is a highly processor intensive task. It also involves the movement of large quantities of data. This is particularly so once the data has been decompressed in order to perform high quality processing. Thus, processing power and bandwidth constraints are a significant bottleneck in the creation of high quality video conferencing MCUs which allow multiple views of the conference to be produced.
FIG. 1 shows a conventional MCU architecture. The exemplary architecture has a plurality of digital signal processors 2, such as the Texas Instruments TMS series, which are interconnected via a Time Division Multiplexed (TDM) bus 4. A controller and network interface 6 is also connected to the TDM bus. Each DSP 2 is allocated one or more time-slots on the TDM bus. It will be appreciated that the TDM bus is a significant bottleneck. Whilst increased processing power for the MCU may be achieved by adding more powerful DSPs or additional DSPs, all the data flowing between DSPs and between the network 8 and the DSPs must fit into a finite number of time slots on the TDM bus 4. Thus, this form of architecture generally scales poorly and cannot accommodate the processing requirements of per-participant compositions.
FIG. 2 shows another conventional configuration. In this example, a plurality of DSPs 2-1 are each connected to a Peripheral Component Interconnect (PCI) bus 10-1. Similarly, a plurality of DSPs 2-2, 2-3 and 2-4 are connected to respective PCI buses 10-2, 10-3 and 10-4. The PCI buses 10-2, 10-3 and 10-4 are in turn connected via buffers 12 to a further PCI bus 14. A significant advantage of this architecture over that shown in FIG. 1 is that the DSPs in group 2-1 may communicate amongst one another with the only bottleneck being the PCI bus 10-1. This is true also for the groups 2-2, 2-3 and 2-4. However, should a DSP in group 2-1 wish to communicate with a DSP for example, in group 2-3, the PCI bus 14 must be utilized. Thus, although this architecture is a significant improvement on that shown in FIG. 1 in terms of scalability and the ability to effectively use a plurality of DSPs, the PCI bus 14 must still be used for certain combinations of intra-DSP communication and thus may become a performance limiting factor for the MCU architecture.
Attempts have been made to offload processing from DSPs. For example, IDT (Integrated Device Technology) produces a “Pre-processing switch (PPS),” under part number IDT 70K2000, for use with DSPs. The PPS carries out predetermined functions before delivery to a processor such as a DSP or FPGA. Processing is determined based on the address range on the switch to which packets are sent. The chip is designed, e.g., for use in 3G mobile telephony and is designed, e.g., to offload basic tasks from DSPs which would normally be carried out inefficiently by the DSP.
A third MCU architecture providing a highly scalable and very powerful processing platform is disclosed in U.S. 20080158338 and U.S. 20090213126, both of which are hereby incorporated by reference in their entirety. FIG. 3 shows a motherboard 20, the motherboard carrying a field programmable gate array (FPGA) 24 and multiple daughterboards 22. The FPGA 24 routes data between a controller (not shown), network interface (not shown) and the plurality of daughterboards 22. The links 26 connecting the motherboard 20 with the first layer of a daughterboard may have a bandwidth of, for example, of 3 Gb/sec or higher. Each daughterboard has a plurality of processors, i.e. digital signaling processors (DSPs) interconnected via a daughterboard switch. Each daughterboard switch is configured to switch data between the plurality of DSPs and between the motherboard, daughterboard and other daughterboards. In one example, and with reference to FIG. 4, each daughterboard 20 has four DSPs 28 each with associated memory 30. Each daughterboard also has an FPGA 32 which incorporates a switch 34. The FPGA 32 also includes processors 36, and two high bandwidth links 38. Although this architecture is a great improvement on the alternative conventional techniques as board to board communication is greatly reduced, board to board communication is still dependent on a processor filtering packets and redistributing media packets to the DSP using a full network stack. This creates an unnecessary burden on the processor slowing down the system. Although particularly mentioned in relation to the architecture of FIGS. 3 and 4, this is an even larger problem for PCI bus based MCU architectures shown in FIG. 2.