Computer animation refers to a computer generated image sequence, wherein a set of pre-defined three-dimensional (3-D) objects which constitute a 3-D world scene are projected into a virtual camera at a fixed sampling interval, resulting with a camera view of the 3-D scene through projected frames.
Examples of computer animation products are video cartoons in which the cartoon objects and their behaviors (e.g. position, motion, interaction etc.) in the scenes are defined by a fixed, deterministic program. More sophisticated applications are computer games in which some parts of the objects' behaviors are controlled by users through hardware game controllers, and some other parts are controlled by software programs which simulate the behavior and interaction of these objects.
In computer-generated image sequences, the entire information on the state of each object in an image scene, such as position, motion and texture, is known. Furthermore, the state of each object is controllable, enabling full control on the projected image of each object. Examples of computer graphics methods that affect the visual details of projected image are: the vertices LOD at the 3-D representation level, and the mip-map and anisotropic filtering at the texture level.
In a multi-user game, the objects in 3-D game scenes are controlled by more than one player. Each user controls a specific set of objects in the scene, and has its own viewing camera. A multi-user game system can be based on a centralized game server which computes the state of a 3-D scene based on the users inputs and clients. Each user client computes and renders the specific user's state based on any scene updates sent by the centralized server and by the user's input control and viewing angle.
A centralized game server can also serve single game users, where for each game user, the state of game scene can be either computed in the server, or on the user's client, or can be arbitrary partitioned between the server and the client.
The streaming methods employed between the centralized game server and the clients, and the type of computations employed on them, depend on the underlying communication channel and on the client type. Web games use broadband communication channel, such as DSL, and PC clients. They are mainly based on Shockwave and on Java applications that run on the PC client side. For example, a particular game streaming method downloads in real time, chunks of a game application to the client PC, while the PC runs the application.
A related method is based on a centralized game server which executes game applications and streams the compressed audio/visual information to the client side. This method uses a “dumb” thin client which only decodes and presents the audio/visual compressed data.
In some systems, the compression methods used are based on the MPEG2 or H.264 visual compression standards and on the MPEG2, AAC or AC3 audio compression standards. The streaming is based on the MPEG2 system and/or on IETF IP packetisation. The type of compression and streaming method chosen, is based on the types of the clients and of the communication channels.
MPEG2 and H.264 visual compression standards are based on entropy coding of motion compensated image blocks of fixed size, named macroblocks, in which the target image is partitioned. The motion compensation can be done relative to some previous and/or upcoming reference pictures.
In the past, some systems have been presented in which the rendering and encoding modules are combined into a hybrid rendering-encoding module, wherein the rendering output is directly fed to the encoding module. This reduces the processing latency and the redundant computation repeated at the encoding side. Such systems, employ the MPEG2 as a visual compression standard, and present a method for motion estimation of a set of partitioned image regions that is based on averaging the geometric optical flow between target and reference image frames.
There is another system which includes instruction interception functions for intercepting the rendering commands to the main graphics processor. This system generates a second set of rendering commands to a sub-graphic processor. The graphics data generated by the sub-graphics processor is used to generate the motion information for the video compression stage of a video sequence generated by their main graphics processor. The systems aim to provide faster compression computation, thus reducing the overall system latency.
Overall, there are systems that disclose methods which reduce computation overhead at the encoder side and provide accurate motion information that are derived directly or indirectly from the original 3-D scene. However, these systems don't deal with the major problem of the streaming server that includes the optimization of visual quality of the streaming video, and end-to-end system delay for encoding, streaming and decoding.
The quality of compressed video streaming can be constrained by two parameters: 1) system bandwidth that is measured in bits-per-second, and 2) end-to-end system delay that is measured in seconds. Both restrictions imply constraints on the size of the compressed frames and hence on the resultant compressed video quality.
In natural video, the size of compressed frames can be controlled by visual encoders through pre-filtering of input frames, through sophisticated motion estimation techniques (which try to estimate and model the exact motion of different objects in the scene, thus minimizing the motion compensated error difference of the encoded blocks), and through the increment of quantization level of the transformed motion compensated blocks.
Video encoders of natural video which are based on existing compression standards (such as MPEG2 or MPEG4), employ various techniques to accommodate the above constraints on compressed frames. However, the pre-filtering or quantization which does not match the visual content (e.g. does not distinguish between different objects and between regions of interest to other parts of the frame that are of less interest) will result in poor image quality. The motion estimation may fail in complex visual scenarios where actual motion cannot be accurately estimated using the limited computational resources of encoders. This may produce poor motion compensation and accordingly poor encoding quality.
Professional video encoders for cable TV and for satellite broadcast, which are based on the above MPEG compression standards, are designed with 1-2 seconds end-to-end system delay. They enable multi-path encoding of the video sources, and accommodate the large variations of size in compressed pictures that may be present due to the unpredictable nature of natural visual scenes, through the introduction of a large system delay. Large delays are unacceptable in streaming systems of interactive applications, which requires less than 200 mili-seconds respond time. In fast motion gaming, such as in First Person Shooter (FPS) games, the delay requirements may be even tighter.