By creating an office, this sort of architecture enables services to be offered to a distant terminal from a PC server acting as a media centre (a hardware system with the job of reading multimedia files: video, sound, image, sound). This sort of office comprises an on-screen presentation forming a user interface for the client terminal, a menu enabling a user to execute a command. This office is managed by the server, but receives remote commands from the client terminal across the network. This type of architecture makes it possible to operate with terminals that do not necessarily have significant calculating resources (thereby resulting in lower cost terminals), the majority of the applications being sent and supported by the server transmitting the processed data to the terminal.
A media centre comprises a control unit and an operating unit to act on the command. Typically, the control unit may include an on-screen display, for example an office display, with control buttons. This control unit includes a device, such as a remote control, for activating the control buttons displayed. The media centre's operating unit manages the actions generated by actuation of the displayed buttons, such as turning up the sound or moving from one video sequence to another via a change of state.
This sort of media centre may, for example, be displayed on a television screen or on another display means producing a user interface. The user may interact with the displayed data, for example, using the remote control.
User management of a media centre is achieved at client terminal level. A user interface can be defined as a tree of possible user commands. The user thereby interacts with this user interface by giving execution orders using the remote control, for example, from among the possible options displayed by the user interface. These orders are received by the client terminal and lead to the creation of user interactions by the client terminal. From now on the terms “user interaction” and “user event” will be interchangeable.
Following the creation of a user event (by pressing a button on the remote control, for example), the client terminal sends a request to the server, in order to initiate processing of the aforementioned event. It is the server which, in processing the request sent by the client terminal, processes the user order. Once the request has been processed, the server sends a response to this request to the client terminal. The server response is produced by processing the user event and, particularly, by encoding the video data to be broadcast by the client terminal following the user event. This response is received and decoded by the client terminal, which displays the processing result on the user interface.
In this sort of system, the server encodes, in other words, compresses what it broadcasts before sending it to the terminal. If the server had to display on its own screen the images that it broadcasts, it would not be necessary for it to compress them. The transfer units in the internal bus of the server actually tolerate a high output. In order to compress, the server captures its own display, encodes it and sends it to the client terminal, for example to an IP address of the client terminal for an Ethernet network. Encoding is therefore carried out from an image defined sequentially pixel by pixel or rather in bitmap format. This sort of image defined sequentially pixel by pixel is well-adapted for display on a monitor.
The encoding carried out by the server is the spatio-temporal type (according to standard H264, for example); spatio-temporal encoding only fully encodes part of the images being transmitted, in order to recreate a video. Standard H264 is a video coding standard jointly developed by the VCEG (Video Coding Experts Group) and the MPEG (Moving Pictures Experts Group). This standard facilitates the encoding of video streams with an output more than two times lower than that obtained by the MPEG2 standard for the same quality and transmission of data at high speed over a simplified link, such as HDMI. During encoding, an image is broken down into macro-block units and each macro-block is encoded. Spatio-temporal encoding only fully encodes part of the images being transmitted, in order to recreate a video. Standard H264 includes the types of images known and defined in standard MPEG2, specifically:                I (Intra) images, in which the coding does not depend on any other image,        P (Predictive) images, in which the coding depends on images received previously,        B (Bi-predictive) images, which depend on images received previously and/or subsequently.        
On receipt, the client terminal must decode the video data sent by the server. The decoding of this data by the client is generally carried out by a dedicated electronic circuit of a graphics card in the terminal. Once the data has been decoded, the terminal broadcasts it via its broadcasting means onto the screen.
However, the encoding of data by the server requires significant power. Furthermore, encoding generally requires an execution time not allowing for encoding, transmission and decoding in real time. Encoding therefore typically requires of the server device five times more power than is required of the client device for decoding.
Today's media centres contain a large number of animations. These animations are, for example, the result of a user click, an animation on a button, a background moving periodically or quite simply the movement of a window. A great number of these animations take place following a user event. These animations are in fact short video sequences that the server must encode and transmit to the client device, in order for it to be broadcast via the user interface. However, following such user events, only part of what is displayed by the server device undergoes change. In fact, for example, for a menu that drops down following a user click on a tab on this menu, only the part where the menu drops down changes; the rest of the image remains fixed. Video protocols currently only encode full images; in other words, what has changed following the user event and what has not changed. Even if, after coding according to standard H264, only those sections that have changed are finally inserted in the video stream, the effort involved in determining which parts have changed and which have not slows down the encoding time of the video data being transferred greatly.
During testing, which was conclusive in relation to the main display and remote management function, the display time proved to be excessively long. This display time was in the order of a few seconds for a single high-definition image, not allowing use of this function in the state.
This excessively long display time is explained by two factors. These are firstly the transmission time across the IP link and secondly the processing time for request messages. The decoding of images at client device level is all the longer when the data being decoded are compressed. In the same way, encoding is all the longer when the compression format is complex. Currently, in order to carry out graphic decoding with this type of application, the decoder must include a graphics library enabling compressed video data to be decompressed.
One solution known to anyone skilled in the art for resolving the problems associated with the data transmission time across a network involves reducing the size of the data moving through the network device. By thereby compressing the data following known compression standards, information is obtained that is less costly in terms of disk space. This compressed information therefore moves more quickly across the network. However, this sort of solution makes the compression of video data more complex still and therefore increases the server encoding time. Moreover, this complexity in compression also increases the time required at the client device to decode the data received. Moreover, this solution makes it necessary to integrate the corresponding library in the client device. This solution therefore brings with it the advantage of reducing the data transfer time across the network, but it increases considerably the data processing time at server and client level.
One solution for reducing the encoding and decoding time is to simplify the information being transmitted. Therefore, by using simple encoding, an effective encoding and decoding time is achieved. However, one problem generally linked to a transmission with simple encoding is the data transfer time. In fact, as a general rule, the simpler the encoding, the greater the disk space occupied by the data. The time thereby gained at encoding and decoding level is lost when it comes to the time taken to transfer the information across the network.
By combining a VNC (Virtual Network Computing) client/server application with the H264 protocol at screen capture level, for example, the problem of the full encoding of entire images can be resolved. For example, TightVNC is an application enabling a server computer to be accessed remotely from any client computer connected to the Internet. This means that all movement detection and image calculation functions, etc. are calculated by the H264 library. A full screen image in the video is then sent including only the changes, this image having a high compression rate. Finally, the VNC server only manages events and no longer performs image analysis.
However, this sort of method requires the architecture of the TightVNC server application code to be completely changed. Moreover, there is a risk that the encoding duration will be relatively long at server device level. With a server device having a 2.8 GHz dual-core processor, encoding with minimum options lasts more than a tenth of a second per image at a resolution of 352*288 for a bit rate (binary flow) at 30.0 Hz of 150 kilo octets per second. This sort of method would therefore take something in the order of a second to encode an image with a resolution of 1280 by 720 to read from the VNC video.
None of these solutions therefore seems to effectively solve the problem of managing the transfer of video data at an acceptable speed in a network device.