Audio and video communication systems can involve two or more endpoints that are either connected directly to each other through a network, or through one or more servers. The latter configuration can be used for multi-point connections. The endpoints are loci for encoding and decoding audio and video, as well as encoding and decoding computer-generated imagery, referred to as “content”.
An example of such contents the window from a PowerPoint presentation. In such endpoints, the encoding and decoding processes, for both video and audio, can be performed on the same system. An example endpoint would be a desktop computer where encoding and decoding are performed using software. Other endpoints may use custom or other encoding and decoding hardware to perform the corresponding encoding and decoding operations.
The computational demands of video and audio communication can lead to certain custom-made systems. Increasing computing power of general-purpose computers, has allowed them to perform the necessary functions using software implementations of the same algorithms on general-purpose CPUs. It is common to run videoconferencing applications on Windows PCs or MacOS systems.
Portable devices such as mobile phones and tablets can be equipped with built-in video and audio decoding chips and can be capable of performing such decoding with reduced power requirements. These devices, however, often perform encoding operations using software. In the absence of dedicated encoding hardware, the encoding runs on the general-purpose CPU of these devices and consumes power. They can also be limited by the overall speed profile of the particular microprocessor.
Certain video and audio coding algorithms are asymmetric, with the encoding process more complex than the decoding process. This can make the task of video encoding challenging. As described in commonly assigned International Patent Application No. PCT/US14/036409, “Systems and Methods fir Using Split Endpoints in Video Communication Systems,” incorporated herein by reference in its entirety, it can be advantageous to offer one or more separate system units that would perform video encoding, and operate in conjunction with the main unit that performs decoding and display. More than one unit can be used if more than one video stream would need to be encoded. By taking the encoding operation outside the main unit, more resources can be provided and lead to improved video quality.
There are other examples where such split system operation can be desirable. Consider the case Where one conducts a videoconference with his or her phone, and wants to display the video signal(s) on a nearby TV set. Connecting the phone to the TV with a cable can be done, but likewise can be cumbersome. It can be useful to have a second unit that would attach to the TV and would perform decoding and display, and perform encoding on the phone itself. In this example, one may prefer decoding to be performed on the “satellite” system rather than the main unit.
An example where it is desirable to have encoding be done on the “satellite” system is thin clients, commonly referred to as “virtual desktop infrastructure” (VDI) clients. VDI environments can involve two components: a server component; and a client component. The server component involves a server that can run multiple virtual machine (VM) instances. Each such VM runs an operating system of choice, together with any desired application software that is installed on it. The VM is the actual computer that the end user access. The client component of the VDI environment can utilize software that provides remote access to the remote VM running on the server. The client software is typically very small and efficient, and can thus run on much simpler (and cheaper) hardware than the one running in the VM. Some VDI solutions bundle the client software together with hardware to run it.
VDI systems can allow the partitioning of computational capability so that certain requirements fall on the server side rather than the client side. This can provide an advantage since the server component can be shared by many users and can be more easily managed. In addition, the client component can run on a wide array of different devices, including desktops, laptops, iPads, smartphones, etc., giving users tremendous flexibility in terms of ways to access their data a d applications. FIG. 1 shows an example architecture of a VMware View system.
One component of the architecture is the communication between the server and the client component. This is because the quality of the experience that the user enjoys has to do with the responsiveness of the system, as experienced on the client device. If, for example, it takes a considerable amount of time from the instance a user clicks on a button until the button is shown to transition to its clicked state, it may be very frustrating. This transition can depend on the amount of time it takes for the click event to be transmitted from the client to the server, the time it takes for the server to respond to the event, and for the screen update to propagate from the server to the client. This last component can be subject to the highest delay, since it involves the transmission of non-trivial amounts of data from the server to the client.
VDI environments can employ custom protocols to improve the communication of data from, at least, the server to the client and thus minimize both the bitrate needed as well as the delay. The VMware View environment can use the proprietary PCoIP protocol, discussed in “VMware View 5, Performance and Best Practices,” published by VMware and available on the web site http://www.wmware.com.
The physical separation of the server component and the client component may be challenging for real-time multimedia applications, such as streaming video and videoconferencing. This is because these applications typically are designed so that the media decoder runs on the same computer or system that the display will take place. There can be a high speed data path available for the decoded data to be sent from the decoder to the display. In a VDI environment, the decoding can take place in the server and the display on the client. This can necessitate the transmission of uncompressed, high-volume data such as video to be transmitted from the server to the client. Particularly for applications such as videoconferencing, where both delay and bitrate constraints are strict, this can represent a challenge.
Techniques to address this include the incorporation of the codec within the VDI client system. Considering that the client may not be designed to be a particularly powerful device, for these systems it may be advantageous to be able to over a second satellite system component that would perform the encoding operation, with the main client unit performing the decoding operation.
The ability to have distinct system components perform encoding and decoding can also be relevant for multi-camera, multi-monitor systems. These configurations can be used in telepresence systems, among others. Commonly assigned International Patent Application No. PCT/US11/038003, “System and method for scalable communication using multiple cameras and multiple monitors,” incorporated herein by reference in its entirety, describes systems and methods for designing systems with multiple cameras and/or multiple monitors. An example of such a system is shown in FIG. 2. The multi-camera/multi-screen endpoint includes a Control Unit 270 to which several Node Units (230, 240, 250) are attached. Three Node Units are shown by way of example; more or less can be used. The Node Units 230, 240, and 250 can perform encoding and/or decoding as desired. The configuration is similar to one with a main system and one or more satellite systems: the main unit could be a Control Unit with a Node Unit that performs either encoding or decoding; and the satellite unit could be a second Node Unit that performs the other operation (decoding or encoding, respectively). The connection between the individual systems (main and satellite system) can be by network (wired or wireless), by USB attachment (the satellite is a USB device that is attached to the main unit), or some other suitable communication means.
In the above examples, whenever audio capture and playback occur in different system components, it can be beneficial to address the issue of echo cancellation. Echo cancellation addresses the problem that the audio played back by a system's speaker(s) is picked up by the system's microphone(s) and is thus sent back to its origin, where it will be heard as an echo. Echo cancellation can be implemented on a device that performs both acquisition as well as playback, or on a device that is connected to such an audio playback and recording device. For example, in telephony, echo cancellation may be implemented on a central office switch, although the actual audio is captured and played back at a telephone located in the user's premises.
In certain audio and video communication systems, where the endpoint has the computational capacity or hardware to perform echo cancellation, it can be applied on the endpoint itself. This can be performed in software running on the host CPU, or it can run in specialized hardware that is included in the audio equipment used in the endpoint. Several commercially available USB speakerphones, for example, feature built-in echo cancellation.
In these systems, the device that performs the echo cancellation can have access to both the audio that is played back as well as the audio that is being captured. In split systems, however, where these functions may be performed by distinct system components, this assumption may no longer apply. There exists a need for an improved technique that enables the use of echo cancellation in such split endpoint systems.