Hosted virtual desktop is a technology in which the desktop compute and storage functions are centralized and often virtualized on a server farm in a data center. This technology has a number of benefits to enterprises as well as end users, and is rapidly gaining popularity. However, the peripheral/human interaction devices like keyboard, mouse, monitor, headsets, web cameras, etc., still need to be present with the user. These devices communicate to the compute function/instance over a network using a display protocol.
Since the peripheral devices (henceforth referred to as the “client”) are stateless, users can use any such available devices and access their own compute/desktop environment (henceforth referred as the “host”) present in a datacenter, for example. Another entity called a “connection broker” negotiates the association between the client and the host based on the user credentials. For this purpose, the client communicates with the connection broker and presents the credentials of the user for accessing the host associated with the user. The connection broker validates the credentials and notifies the host about the client identity while passing this host identity to the client. This enables the client to communicate to the host while the host can ensure that the client that is communicating with it is indeed a legitimate client with an authorized user.
While the hosted virtual desktop model has a number of advantages over the conventional “thick client” model, it has its own set of challenges. One challenge is that multimedia traffic is “hair-pinned” over the network and the media is delivered to the display at the user device (client) in a display protocol specific format. Network caching schemes can reduce the bandwidth consumed by the native rich media but cannot be used for the multimedia delivered to the virtual desktop clients. In addition, voice traffic is subject to latency and jitter introduced by a hypervisor scheduler. Hypervisors are designed for compute intensive environments like servers and are not designed to handle real-time tasks, such as processing voice traffic. The communication between the compute instance and the client is encrypted and most of the display protocols use one reliable session for communication. This limits the ability for the network devices to prioritize the traffic based on the type or to apply security policies in the network infrastructure.