A three dimensional video conferencing application, hereafter referred to as the immersive video conferencing—IVC, represents a combination of positive attributes of video conferencing systems and distributed virtual environments. Similar to a virtual environment, the participants are represented by an avatar and can roam freely in a three dimensional environment. However, in IVC, the participants' avatars display their real-time video.
Typically, the video of each participant is a 2D video shown on a flat surface of the avatar. However, although the video is 2D, the avatar is free to move and rotate within the three dimensional virtual environment, hence, the video will be presented (rendered) at different three dimensional orientations and distances relative to the viewer.
A key challenge for scalable delivery of IVC is to minimize the required network bit rate required to support many participants in the same session. Both upload and download capacity of a client is of concern. In the case of peer-to-peer (P2P) delivery of IVC, each client must transmit a copy of its video to all others and receive all other videos. Therefore, the required capacity grows linearly with the number of participants (and for the whole application, the required network capacity grows as the square of this number). A video conference server (often called a conference bridge) will solve the upload capacity problem, because each client needs only to send one copy of its video to the server. But the download bottleneck remains since the clients still need to download all videos from the server.
An example of a method to reduce the required network capacity is discussed in applicants' international patent application WO 2013/003914. In this method, the system dynamically evaluates which of the avatars are within the visual range of the viewer (referred to as the viewer's area of interest—AOI). Only those videos that are relevant to the AOI will be downloaded, which results in a significant reduction in overall network capacity consumption.
A current model for video quality differentiation is to use hierarchical video coding (HVC) or multiple description coding (MDC). In both models, the video stream is split into a number of sub-streams, called layers or descriptions in HVC and MDC respectively. The user who receives all the sub-streams will be able to decode the video at the maximum possible quality. If some sub-streams are not sent by the source or dropped by the server/network, the receiver will decode a lower quality video. The primary difference between HVC and MDC is that in HVC the layers are dependent on each other. The user must receive a base layer to be able to decode the video at all. The other layers are enhancement layers that would improve the decoded video but without the base layer cannot be decoded individually. In contrast, the multiple descriptions of MDC are independent of each other and can be decoded in isolation. This flexibility, however, comes at the cost of higher bit rate for the sub-streams.
HVC and MDC were designed for a video distribution (multicast) scenario where there is a heterogeneous population of recipients with different video quality requirements (e.g. a mobile phone versus a TV screen) or bandwidth capacity (wireless versus high-speed landline). But unfortunately, these techniques do not address the needs of an IVC video quality differentiation for the following reasons:    1—Each enhancement layer or description sub-streams is concerned with improving the overall video quality, equally in every part of the video. In IVC, the quality differentiation may not be uniformly distributed over the spatial extend of a video. For example, the video may be partially occluded or the avatar may be rotated and some parts of the video requiring lower quality due to perspective variations.    2—The number of layers or descriptions is usually quite small as it is computationally expensive for the source to produce and manage many different layers. This means that the resulting quality differentiation is rather coarse. However, in IVC we require finer granularity for quality adjustment based on numerous factors, such as virtual distance, angular orientation, etc.
Embodiments of the invention provide techniques to allow individual video streams to be “pruned” before transmission to the client to reduce the bit rate of the video stream while maintaining the perceptual quality of the video image to the viewer (the pruning can take place at the origin in a P2P model or in the server for a server-based IVC). Different participants will have different perspectives. So a particular video stream may be required at many different quality levels depending on the number and relative position of other participants who are looking at this individual at this time. Hence, unlike a point-to-point video telephony scenario, it is not possible for the source to simply adjust its video coding parameters based on the receiver's requirement.