Although computers were once isolated and had minimal or little interaction with other computers, today's computers interact with a wide variety of other computers through Local Area Networks (LANs), Wide Area Networks (WANs), dial-up connections, and so forth. With the wide-spread growth of the Internet, connectivity between computers is becoming more important and has opened up many new applications and technologies. The growth of large-scale networks, and the wide-spread availability of low-cost personal computers, has fundamentally changed the way that many people work, interact, communicate, and play.
One increasing popular form of networking may generally be referred to as virtual computing systems, which can use protocols such as Remote Desktop Protocol (RDP), Independent Computing Architecture (ICA), and others to share a desktop and other applications with a remote client. Such computing systems typically transmit the keyboard presses and mouse clicks or selections from the client to a server, relaying the screen updates back in the other direction over a network connection (e.g., the Internet). As such, the user has the experience as if their machine is operating as part of a LAN, when in reality the client device is only sent screenshots of the applications as they appear on the server side.
Because bitmaps are expensive in terms of bandwidth consumption when transmitted over a network connection (e.g., Internet), rather than sending the entire bitmaps most virtual systems nowadays send graphic primitives and other operations, which tell a sub-routine on the client side what and how to draw something. For example, a client may be told to draw a rectangle along with information about where it should be drawn, what size, color, etc. For instance, a rectangle may be used to draw a button for a user interface, a border around a document, or any other purpose for which a rectangular shape may be useful. Of course, there are many other shapes and operations that can be used as primitives that may be more sophisticated and require more processing that must be done to transfer and perform the operation on the remote client.
As applications continue to get more sophisticated graphical user interfaces, the more processing intensive the use of the above primitives becomes. For example, bitmap images have been expanded to include an Alpha channel, which essentially indicates a desired level of transparency associated with each pixel. This transparency level instructs the client how to blend each pixel of the bitmap with the color that was already present where the bitmap is being displayed. An even better example of the expense associated with processing sophisticated primitive updates may be animation objects or elements, where a sequence of commands must instruct the client on how to draw the animation at each stage of the animation. When sequences of primitives are too complex, it may sometimes make more sense to send a bitmap representation that can more simply be displayed, rather than the potentially long sequence of other more complicated primitive operations. As previously mentioned, however, it may be too expensive to continually send full bitmap representations of the screen because of the limitations of most bit stream compressors together with limited network bandwidth.
There exist a class of processors known as vector processors that have single instruction, multiple data (SIMD) instructions in their instruction set architecture (ISA). Streaming SIMD extensions (SSE) such as the SSE 4.2 instructions in some INTEL™ x86 ISA processors, like the NEHALEM™ processor are a form of these SIMD instructions. These processors are able to speed up processing of certain types of data because they can operate on a large chunk of data at once. For instance, where an image is being processed, instead of operating on a single pixel at a time, a SIMD processor may operate on several pixels in parallel with a single instruction. Not only does this improve the performance of processing the instruction itself, but it may decrease the time spent fetching data from memory.
While SIMD instructions offer opportunities for improving the performance of some types of processes, such as processing image data for compression, the algorithms and techniques required to implement the process are considerably more difficult than with a non-vector processor. Special attention must be paid to data flow, and to organizing data in such a manner that it may be operated on in parallel. To that end, there would be a benefit from new techniques to increase the parallelism in operations on image data.