Typical video communication systems involve two or more endpoints that are either connected directly to each other through a network, or through one or more servers. The latter configuration can be used for multi-point connections. The endpoints are loci for encoding and decoding audio and video, as well as encoding and decoding computer-generated imagery, referred to as “content”. An example is the window from a PowerPoint presentation in such endpoints, the encoding and decoding processes, for both video and audio, can be performed on the same system. An example endpoint would be a desktop computer where encoding and decoding are performed using software. Another example is the VidyoRoom HD-220 system offered by Vidyo, Inc. This system can be connected to a digital, video camera, an audio speakerphone, and up to two displays to provide videoconferencing service to a conference room. Encoding and decoding can be performed on the main unit of the device (in this example, again using software). Other endpoints may use custom or other encoding and decoding hardware to perform the corresponding encoding and decoding operations.
The computational demands of video and audio communication have led, in the past, to certain custom-made systems. Increasing computing power of general-purpose computers, has allowed them to perform the necessary functions using software implementations of the same algorithms on general-purpose CPUs. It is common to run videoconferencing applications on Windows PCs or MacOS systems.
Portable devices such as mobile phones and tablets are now equipped with built-in video and audio decoding chips and can be capable of performing such decoding with significantly lower power requirements. These devices, however, typically perform encoding operations using software. In the absence of dedicated encoding hardware, the encoding runs on the general-purpose CPU of these devices consuming considerable power. It can also be limited by the overall speed profile of the particular microprocessor. Many video coding algorithms are asymmetric, with the encoding process more complex than the decoding process. This can make the task of video encoding on these devices challenging. It would be advantageous to offer one or more separate system units that would perform video encoding, and operate in conjunction with the main unit that performs decoding and display. More than one unit could be used if more than one video stream would need to be encoded. By taking the encoding operation outside the main unit, more resources can be provided and load to improved video quality.
There are other examples where such split system operation can be desirable. Consider the case where one conducts a videoconference with his or her phone, and wants to display the video signal(s) on a nearby TV set. Connecting the phone to the TV with a cable can be done, but likewise can be cumbersome. It would be useful to have a 2nd unit that would attach to the TV and would perform decoding and display, and perform encoding on the phone itself. In this example, one may prefer decoding to be performed on the “satellite” system rather than the main unit.
An example where it is desirable to have encoding be done on the “satellite” system, is thin clients, commonly referred to as “virtual desktop infrastructure” (VDI) clients. VDI environments can involve two components: a server component; and a client component. The server component involves a server that can run multiple virtual machine (VM) instances. Each such VM runs an operating system of choice, together with any desired application software that is installed on it. The VM is the actual computer that the end user access. The client component of the VDI environment can utilize software that provides remote access to the remote VM running on the server. The client software is typically very small and efficient, and can thus run on much simpler (and cheaper) hardware than the one running in the VM. Some VDI solutions bundle the client software together with hardware to run it.
VDI systems can allow the partitioning of computational capability so that certain requirements fall on the server side rather than the client side. This can provide an advantage since the server component can be shared by many users and can be more easily managed. In addition, the client component can run on a wide array of different devices, including desktops, laptops, iPads, smartphones, etc., giving users tremendous flexibility in terms of ways to access their data and applications. FIG. 1 shows an example architecture of the a commercially available VMware View system.
An important component of the architecture is the communication between the server and the client component. This is because the quality of the experience that the user enjoys has to do with the responsiveness of the system, as experienced on the client device. If, for example, it takes a considerable amount of time from the instance a user clicks on a button until the button is shown to transition to its clicked state, it can be very frustrating. This transition can depend on the amount of time it takes for the click event to be transmitted from the client to the server, the time it takes for the server to respond to the event and—most significantly—for the screen update to propagate from the server to the client. This last component is typically the one subject to the highest delay, since it involves the transmission of non-trivial amounts of data from the server to the client.
VDI environments employ custom protocols to optimize the communication, of data from, at least, the server to the client and thus minimize both the bitrate needed as well as the delay. The VMware View environment can use the proprietary PCoIP protocol, discussed in “VMware View 5, Performance and Best Practices” published by VMware and available on the web site http://www.vmware.com.
The physical separation of the server component and the client component may be challenging for real-time multimedia applications, such as streaming video and videoconferencing. This is because these applications typically are designed so that the media decoder runs on the same computer or system that the display will take place. There can be a high speed data path available for the decoded data to be sent from the decoder to the display. In a VDI environment, the decoding would typically take place in the server and the display on the client. This can necessitate the transmission of uncompressed, high-volume data such as video to be transmitted from the server to the client. Particularly for applications such as videoconferencing, where both delay and bitrate constraints are strict, this can represent a challenge. It is therefore useful to design systems that can allow video communication in VDI environments.
The ability to have distinct system components perform encoding and decoding is also relevant for multi-camera, multi-monitor systems. These configurations can be used in telepresence systems, among others. Commonly assigned International Patent Application No. PCT/US11/038003, “System and method for scalable communication using multiple cameras and multiple monitors,” incorporated herein by reference in its entirety, describes systems and methods for designing systems with multiple cameras and/or multiple monitors. An example of such a system is shown in FIG. 2. The multi-camera/multi-screen endpoint includes a Control Unit 270 to which several Node Units (230, 240, 250) are attached. Three Node Units are shown by way of example; more or fewer can be used. The Node Units 230, 240, and 250 can perform encoding and/or decoding as desired. The configuration is similar to one with a main system and one or more satellite system: the main unit would be a Control Unit with a Node Unit that performs either encoding or decoding; and the satellite unit would be a second Node Unit that performs the other operation (decoding or encoding, respectively). The connection between the individual systems (main and satellite system) can be by network (wired or wireless), by USB attachment (the satellite is a USB device that is attached to the main unit), or some other suitable communication means.