WebRTC is a software library that contains a collection of protocols and codecs implementations and that enables real time communications between peers on the Internet. WebRTC also defines the API interfaces provided by some software applications such as browsers to provide access to these communication capabilities. WebRTC enables sending media data (audio and video) peer to peer, as well as data through data channels. Present invention will focus on the former.
Peer-to-Peer Mesh Architecture:
WebRTC by default enables peer to peer communication. In a peer-to-peer mesh architecture peers can communicate directly if they can find the public IP addresses of the other peers or through a TURN (Traversal Using Relays around NAT) server otherwise. However, this model has scalability issues since each peer would have to send as many streams as peers it is communicating with, and the same number of streams is going to be received. This model scales poorly after a certain limit, and this is why other architectures such as one based on an SFU (Selective Forwarding Unit) have gained popularity in multiparty and broadcast use cases.
SFU-Based Architecture:
An SFU (Selective Forwarding Unit) is a routing device that is placed in between peers that want to communicate to each other, and which sends the streams it receives selectively to the peers interested in that stream. By using this model, each client needs to send its stream only once, which means that it only has to send one stream. So, it is possible to develop an SFU that will handle the communication between peers, selecting which streams will be received by each client. An SFU-based architecture is more suitable than the Peer-to-Peer mesh architecture for use cases in which more scale is needed.
Peer Connection:
In WebRTC, a client that wants to send a stream of data to another client or that wants to receive a stream from another client will create a Peer Connection. A Peer Connection in WebRTC is an object that allocates all the resources and handles the logic for clients to send streams to each other. Since a client can potentially send and receive multiple streams, due to the lack of support for Peer Connection renegotiations in some implementations of WebRTC, it was more convenient to use multiple peer connections in a client, one for each stream that has to be received or sent. However, when a client receives multiple streams of data from a single endpoint, the scenario has changed, and more recent implementations allow using a single peer connection that can be used to send and receive multiple streams. Using single peer connection has some advantages such as; clients need to allocate fewer resources, use fewer threads, open fewer sockets; clients also will successfully receive all the streams or neither, which means fewer inconsistencies for the end user experience. Also, the use of a single peer connection means that there is just a single point of failure for each client, so either the client will be able to receive and send all the necessary streams or in case of failure the client will send and receive no stream. This is positive from a user experience perspective as no awkward middle ground scenario can happen in which for instance a client receives just part of the streams, so it can hear and see some of the participants of the conversation but not others. For these reasons the concept of Single Peer Connection, in which clients use a single Peer Connection to send streams to or receive streams from an SFU and renegotiate when a change in state is necessary, is becoming popular.
Stream Quality
When talking about clients, we say that a client sends a stream or receives a stream. A stream is stream of data, generally over UDP that is sent by the client or received by the client. Audio bitrate, video bitrate, packet loss, latency, jitter, are some of the stream quality metrics that can help to determine statistically if the experience of the end user is good. So, a good stream quality is characterized by high audio and video bitrate, low packet loss and low jitter.
Following, a state-of-the art video conferencing platform using as underlying technologies WebRTC and WebSockets is described. Firstly, the concept of a session is defined, i.e. an isolated group of clients that are part of a logical unit. All the clients in a session are expected to be able to interact with each other, which includes sending/receiving signal and sending/receiving media. The concept of a stream of data is also defined as a stream of bytes that is sent from one client and potentially received by other clients. When using an SFU, the SFU will be responsible of receiving the streams of different clients and forward them to the clients that have expressed interest in those streams. FIG. 1 shows a graph with the relevant components and the protocols used to interact amongst them.
First of all, there are the clients. The clients are in general either a web application running on a browser that has WebRTC enabled, or another device that can run a WebRTC engine that has been compiled for that particular platform. This is usually achieved by means of SDKs that can be run in multiple platforms such as iOS, Android, Windows, MacOS and Linux amongst others. In practice, these endpoints need to have the following capabilities: Can create Websocket connections to a public Internet endpoint and have a WebRTC engine that supports SDP negotiations, ICE workflow, encoding and decoding of audio and video streams, and use RTP (or its secure version SRTP) for sending or receiving media data and RTCP (or its secure version SRTCP) to control the RTP/SRTP media flow.
Then there are all the platform components. An SFU is a Selective Forwarding Unit that supports WebRTC, and can receive media streams sent by clients, and selectively forward these same streams to other clients that may be interested in them. The SFU also supports ICE protocol in order to establish the connection with the client, even in those cases where the client does not have a public IP address or even lies behind a firewall. More importantly, the SFU needs to implement the RTP/RTCP (and SRTP/SRTCP) protocols to be able to forward the streams between clients.
A further component is the messaging server. Since WebRTC does not define a standard mechanism to handle SDP negotiations, it is necessary to have a component that handles signaling between clients. When using WebRTC, clients cannot assume their network conditions, which mean that it is possible, that in case of having two clients: they both have public addresses; one of has a public address and the other does not; none of them has a public address; either or both of them are behind a proxy. This means that a WebRTC platform cannot assume anything about the reachability of the clients and must provide a mechanism so that they can exchange SDPs as necessary. The messaging server is a component with a public IP address that allows clients to connect using WebSockets. This means that any client regardless of the conditions of the network it finds itself should be able to create a TCP connection over TLS to a messaging server. Then, the messaging server can be used as a router for messages between clients that want to signal each other.
Also, it is important to bear in mind that before a WebRTC session can start, clients need to exchange SDPs to be able to agree on the parameters under which the session will take place. So, the messaging server allows this signaling exchange to happen between clients or between a client and the SFU. In this latter case the Messaging server sends/receives the messages aimed at/coming from the SFU for instance through a connection based on TCP. As there is no specific standard protocol to be used in this case an ad-hoc proprietary protocol can be used.
The messaging server may act as intermediary in the communication between a client and the load balancer or between the load balancer and the SFU, although it may not be strictly necessary as these elements may be directly reachable. As there is no specific standard protocol to be used in the communication between the messaging server and the load balancer an ad-hoc proprietary protocol can be used, for instance based on HTTP.
The load balancer is the component that is aware of which components are available in the platform (e.g. different SFUs), where they can be found and which is their state. So, when a resource is needed by another component, the load balancer is the one that provides the resource that best satisfies the needs of the request.
When a client wants to send a stream of data to the SFU, the flow is the following:                The client sends a request to the messaging server to send a stream to the session.        The messaging server accepts the request, and sends a request to the client to generate an SDP offer.        The clients send a SDP offer to the messaging server which in turn forwards it to the SFU.        The SFU replies with an SDP answer that the messaging server forwards to the client.        At this point, the client and the SFU both know about the existence of each other, and have respectively the IP addresses of each other, so through an ICE workflow the client can reach the SFU and send the stream to the SFU.        
When a client wants to receive a stream of data from the SFU, the flow is the following:                The client sends a request to the messaging server to receive a stream from the session        The messaging server accepts the request, and sends a request the SFU to generate an SDP offer        The SFU sends the generated SDP offer to the messaging server which in turn forwards it to the client.        The client replies with an SDP answer to the messaging server which in turn forwards it to the SFU.        At this point, the client and the SFU both know about the existence of each other, and respectively have the IP addresses of each other, so through an ICE workflow the SFU can reach the client and can send the stream to the client.        
These flows are the basic flows implemented in a state-of-the-art platform such as TokBox. As it can be appreciated, with these flows the messaging server does not have to decide which SFU must be used, once an SFU is assigned for a session all the streams are allocated there. This means that in can happen that an SFU becomes overloaded or it can become unreachable, but streams would still be assigned to it.
Apart from that, there are known some patent or patent application in this field. For instance, patent application WO-A1-2002060126 describes the problem of conference calls including a relatively large number of participants, more participants than a single MCU (multipoint control unit) is capable of facilitating. In this case a potential solution is cascading two or more MCUs to increase the number of endpoints in a multipoint conference. WO-A1-2002060126 proposes a method to automatically configure the MCU cascading with the aim of optimizing network bandwidth. An example provided is the case where multiple participants from a first campus that want to join a conference managed by a MCU located in a second campus. In this case it is more optimum to make them connect to the MCU in the first campus and connect this MCU with the MCU in the second campus. Then just one connection is needed between both campuses, namely the connection between both MCUs, instead of needing multiple connections between the conference call participants located in the first campus and the MCU in the second campus.
U.S. Pat. No. 7,800,642 defines a cascading conference system in which there is a master MCU and one or more slave MCUs. The master MCU has a multimedia connection to each one of the slave MCUs. The multimedia connection between the master MCU and each one of the slave MCUs is similar to a multimedia connection with an endpoint of a conferee. The cascading conference session architecture is defined before the beginning of the conference.
As it can be appreciated from the cited prior art references a setup comprising multiple MCUs collaborating to provide a single multi party conference is well-known. However, none of the previously cited prior art describe any procedure by which participants are dynamically allocated to different SFUs, according to their respective load, so that conferences can be managed according to efficiency principles that exploits the capacity of the different SFUs without overloading them whilst limiting the interactions with a global load balancer. Therefore there is a need for methods and systems for providing the above.