In a traditional centralized cloud environment, all computing is executed in one centralized large data center. In contrast, in a distributed cloud environment, there is no single central data center. Instead, the distributed cloud consists of a potentially high number of geographically dispersed data centers. These data centers have heterogeneous capabilities; some of the data centers may be relatively small and be located at the edge of a network comprising the distributed cloud environment, whereas others may be located at the core of the network and be provided with a very capacity.
Traditionally, Unified Communications (UC) services such as multiparty audio and video conferencing have been provided using dedicated server hardware and Digital Signal Processors (DSPs). Today, there is an increasing trend to migrate hardware-based UC solutions to a fully software-based, virtualized cloud environment. The first step in this migration is to provide software-based UC services in a centralized cloud environment. The next foreseen step is to provide them in a distributed cloud environment.
FIG. 1 shows a simple example of media processing in a distributed cloud environment, in the following also referred to as network 1. In the figure, the distributed cloud 2 is providing a video conferencing service for four users, Users A, B, C, and D. Media processing is distributed in the cloud 2 in such a way that there are local Media Server (MS) 3A, 3B, 3D instances located close to the users at the edge of the network 1. Further, the audio mixing and switching for the conference is being handled by a Media Server 3 in a large data center at the core of the network 1. Each Media Server instance is running in one Virtual Machine (VM) within a data center 4A, 4B, 4D,4.
A reason for media processing needs to be distributed to several virtual machines is that typically, the capacity of a single virtual machine is not sufficient for handling the media processing for all the users in a conference. This is very much the case for instance in a high definition video conference where users may be using different codecs and thus transcoding is required to translate between the different media formats. A reason for distributing the media processing to virtual machines in different data centers is that when media processing occurs as close to the conference participants as possible, latencies can be minimized for users located close to the local media server. Further, responsiveness can be maximized. Latencies need to be minimized to improve the quality of experience for the users of the service. An example of maximized responsiveness is the ability to dynamically adapt the bitrates of the video streams being sent towards the user using feedback from the local Radio Access Network (RAN). Yet another benefit of connecting users to the network-wise closest data center is that the connectivity between the data centers is typically managed, whereas the connectivity of the users might be best effort. Thus, it makes sense to minimize the distance that the media streams travel over the best-effort connections.
The need to distribute media processing to several virtual machines located in different data centers can result in a highly complex interconnection topology for a media session, e.g. a multimedia conference session. Due to the possibly high number of virtual machines and data centers involved, the resulting topology is typically significantly more complex than media processing topologies utilized in the case of hardware-based media servers or in the case of media servers running in a non-distributed cloud environment. Therefore, choosing an optimal media processing topology for a multimedia session is a non-trivial problem.
One approach could be to implement support for only one or a few topologies and to use the same initially selected topology throughout a multimedia session, for example connecting new users and media servers to the network by connecting every new user to the geographically closest media server and connecting every media server to the same central server located in the middle of a star topology. A drawback of this approach is that the initial topology choice may not remain optimal throughout the lifetime of the multimedia session. Further, this may not always result in optimal quality of experience for the users. This approach further scales very poorly to complex geographically distributed cloud environments. As an example, a full mesh topology may quickly run into scalability problems as the number of media servers participating in the topology grows. As another example, a central server located in the middle of a star topology may also become a bottleneck as the number of other media servers connecting to it grows.
Using an inefficient or inappropriate media processing topology can result in increased latency, jitter, and packet loss and thus deteriorated quality for the media session. These problems are emphasized in global geographically distributed multimedia sessions since latency, jitter, and packet loss tend to increase with increased complexity and network distance between the communicating endpoints.