Although computers were once isolated and had minimal or little interaction with other computers, today's computers interact with a wide variety of other computers through communications networks, such as Local Area Networks (LANs) and Wide Area Networks (WANs). With the wide-spread growth of the INTERNET™, connectivity between computers is becoming more important and has opened up many new applications and technologies. The growth of large-scale networks, and the wide-spread availability of low-cost personal computers, has fundamentally changed the way that many people work, interact, communicate, and play.
One increasing popular form of networking may generally be referred to as virtual computing systems, which can use protocols such as Remote Desktop Protocol (RDP), Independent Computing Architecture (ICA), and others to share a desktop and other applications with a remote client. Such computing systems typically transmit the keyboard presses and mouse clicks or selections from the client to a server, relaying the screen updates back in the other direction over a network connection (e.g., the INTERNET). As such, the user has the experience as if their machine is operating as part of a LAN, when in reality the client device is only sent screenshots of the applications as they appear on the server side.
Two common techniques to send graphics data to a client are sending graphic primitives and other operations, which tell a sub-routine on the client side what and how to draw something, and sending a bitmap image to the client for display. When sequences of primitives are too complex, it may sometimes make more sense to send a bitmap representation that can more simply be displayed, rather than the potentially long sequence of other more complicated primitive operations. However, it may be too expensive to continually send full bitmap representations of the screen because of the limitations of most bit stream compressors as well as limited network bandwidth.
To alleviate these issues, a frame that is being sent to a client (such as an application window) may be subdivided into tiles. Those tiles are then cached on the client side, and when a tile is repeated between two bitmaps, rather than re-sending the client the tile, the server sends an instruction for the client to display the cached tile. This may greatly reduce the bandwidth costs of a RDP session, especially where tiles are frequently repeated. However, in doing this, processing resources must then be devoted to caching tiles.
Further, the tiling algorithm is often implemented in such a way as to maximize the chances of a cache hit. Where a tile is smaller, it has a better chance that it will be used twice (either within that frame or in a future frame). There is often a minimum useful tile size as well, because where a tile is too small, only a small benefit is received from a cache hit between two tiles.
These RDP bitmap caching algorithms and detecting of the difference between tiles (“tile differencing”) are critically important to reducing the bandwidth of a RDP display stream to levels that are acceptable for transmission over a LAN, WAN or wireless local area network (wLAN). These caching algorithms typically trade-off processing time (frequently of the central processing unit (CPU)) on a server in exchange for a decreased amount of bandwidth required for that server to transmit the information to a client across a network.
One of the major processing costs of RDP bitmap caching is the hash algorithm used—an algorithm that transforms the larger image data into a smaller data that may be used as an index to a sorted data structure, such as an array or a tree. Some hashing algorithms implement a cipher block chaining (CBC) algorithm, or a variation upon a CBC algorithm. However, this processing time used on the hashing algorithm can inhibit the scalability of the server, since all available processing resources may be used by RDP sessions before any other resource—such as the server's network bandwidth—becomes exhausted. This processing time also increases the time required to encode an image frame, the rate at which these frames may be produced and sent to a client (the frame-rate (FPS)).
Increasing the speed of the hashing algorithm with current parallel processors is difficult, because the CBC hash algorithm is typically serial, which does not lend itself well to parallel processing, such as on a single instruction, multiple data (SIMD) processor.
There exist a class of processors known as vector processors that have SIMD instructions in their instruction set architecture (ISA). Streaming SIMD extensions (SSE) such as the SSE 4.2 instructions in some INTEL™ x86 ISA processors, like the NEHALEM™ processor are a form of these SIMD instructions. These processors are able to speed up processing of certain types of data because they can operate on a large chunk of data at once. For instance, where an image is being processed, instead of operating on a single pixel at a time, a SIMD processor may operate on several pixels in parallel with a single instruction. Not only does this improve the performance of processing the instruction itself, but it may decrease the time spent fetching data from memory.
While SIMD instructions offer opportunities for improving the performance of some types of processes, such as processing image data for compression, the algorithms and techniques required to implement the process are considerably more difficult than with a non-vector processor. Special attention must be paid to data flow, and to organizing data in such a manner that it may be operated on in parallel. To that end, there would be a benefit from new techniques to increase the parallelism in hashing operations on RDP tiles.