A media center is a device comprising a control unit and an action unit for handling the command. Typically, the control unit can include a display of control buttons on a screen, for example that of the desktop. This control unit includes a device, for example a remote control, for activating the buttons displayed. The action unit of the media center handles the actions generated by activating the buttons displayed, for example turning up the sound or switching from one video sequence to another.
Such a desktop or media center can, for example, be displayed by the client on a living room television screen or another display means forming a user interface. A piece of user interface software makes it possible to display data. The user can interact with the data displayed using a control device such as a remote control, for example. Typically, the control unit of a media center as defined above is also part of the user interface.
The control of a media center by a user takes place at the client level. A user interface can be defined as a tree of commands available to the user. Thus, the user interacts with this user interface by giving execution orders, using a remote control for example, among the available choices displayed by the user interface. These orders are received by the client and result in the creation of user interactions by the client.
After the creation of a user interaction, the client sends a request message to the server in order to have said user interaction processed. It is the server that, by processing the request message sent by the client, processes the order from the user. Once this request message is processed, the server sends the client a response to this request message. The response from the server is produced by the processing of the user interaction and particularly by the encoding of video and audio data to be delivered by the client as a result of this user interaction. This response is received and decoded by the client, which displays the result of the operation on the user interface.
In such a system, the server encodes, i.e. compresses, what it is delivering prior to sending it to the client. If the server were to display the images it delivered on its own screen, it would not be necessary for it to compress them. The transfer units in the internal bus of the server machine support a high transfer rate. For a compression, the server typically performs a capture of its own display, encodes it and sends it via the network to the client, for example at the client's IP address in an Ethernet network. The encoding is therefore performed on a sequentially defined point by point image, in the so-called bitmap format. Such a sequentially defined point by point image is well suited to being displayed on a monitor.
The encoding done by the server is space-time encoding, which means that the compressed data include both audio/video data and signaling data for delivering these data correctly. Such an encoding can, for example, be done based on the H264 standard. Such compression encoding makes it possible to transmit data at high speed through a simplified connection, for example an HDMI connection. The H264 protocol makes it possible to encode video flows at a speed less than half that obtained with the MPEG2 standard for the same quality. The H264 standard uses a lossless compression mode. During the encoding, an image is divided into individual macroblocks. Each macroblock is encoded.
Upon reception, the client must decode the audio/video data sent by the server. The decoding of these data by the client is generally performed by a dedicated electronic circuit of a graphics/sound card in the client. Once the data have been decoded, the client delivers them via its delivery means on its own screen.
However, either the encoding of data by the server requires a lot of power, or the encoding by the server requires a processing time that makes real-time data encoding impossible. In practice, encoding requires five times as much power from the server as decoding requires in the client. The typical household servers are not capable of real-time encoding.
The current media centers contain a large number of animations. These animations include, for example, an animated button or icon, a wallpaper in recurring motion or even the scrolling of a scrolling menu. These animations are small video sequences. In order to transmit them to the client that is requesting them, the server must encode them and transmit them so they can be delivered via the user interface. Such video sequences are defined by a series of images delivered at sufficient speed to give the video good fluidity. A large number of these animations appear subsequent to user interactions. However, as a result of such user interactions, only part of what is displayed on the screen of the client is changed. In fact, for example for a menu that scrolls as a result of a user's click on a button of the menu displayed only the part in which the menu scrolls changes, the rest of the image remaining fixed.
Currently, audio/video protocols encode only entire images. Thus, the encoding is done on both the parts of the image that have changed as a result of the user interaction and the parts of the image that have not changed. This overall encoding substantially increases the encoding time of the audio/video data to be transferred.
In conclusive tests of the main display and remote control function, the time required for the display of the audio/video data by the client proved to be too long. This display time was on the order of several seconds for a single image. Thus, this display time does make it possible to use this function as is.
The overly long display time is explained by two factors: first, the transmission time through the IP connection and second, the processing time for the request messages. The object of the invention is to reduce both of these times. The encoding and decoding of audio/video data takes even longer when the data are compressed and therefore complex. Currently, in order to decode graphics with this type of application, the decoder must include a graphics library which makes it possible to decompress compressed audio/video data.
One solution known to the person skilled in the art for solving the problems tied to the transmission time of the data via a network consists of reducing the volume of the data traveling through the network. Thus, by compressing the data as much as possible in accordance with known compression standards, the audio/video data obtained are less voluminous. This compressed information therefore travels through the network faster. However, such a solution makes the compression of audio/video data even more complex. This complexity increases the encoding time in the server. This complexity also increases the time required for the client to decode the data received. Moreover, this solution is dependent on the inclusion of a library that corresponds to the compression format used in the client. This solution therefore has the advantage of reducing the transfer time of the data through the network, but considerably increases the processing time of the audio/video data by both the server and the client.
By combining the TightVNC application with the H264 protocol for a screen capture, for example, the problem of completely encoding entire images can be solved. Thus, all of the functions for detecting movement, calculating images, etc., are calculated by the H264 library. An image of the entire screen in the video that comprises only the changes is then sent, this image having a high compression rate. Moreover, the encoding time of an image does not vary much.
However, with such a method, it is necessary to completely replicate the architecture of the TightVNC server code. Moreover, the duration of the encoding by the server runs the risk of being relatively long. With a server having a 2.8 GHz dual-core processor, the encoding, with the options at a minimum, lasts more than a tenth of a second per image at a resolution of 352×288 with a bit rate at 30.0 Hz of 150 kilobytes per second. Such a method would therefore take approximately one second to encode an image having a resolution of 1280×720.
Thus, none of these solutions seems to effectively solve the problem of handling the transfer of audio/video data at an acceptable speed in a network.