1. Field of the Invention
Mobile communications is currently one of the fastest growing markets and is expected to continue to expand. Today the functionalities of mobile communications are quite limited. Only speech or data can be handled. It is expected that image information, especially real-time video information, will greatly add to the value of mobile communications. Low cost mobile video transmission is highly sought for many practical applications, e.g., mobile visual communications, live TV news reports, mobile surveillance, telemedicine, viral reality, computer games, personal travel assistance, underwater visual communications, space communications, etc. However, there are indeed numerous difficulties encountered with the inclusion of live video information into mobile communications. Different from speech information, video information normally needs greater bandwidth and processing performance. In contrast, the mobile terminals of today suffer from certain limitations, e.g., (1) mobile terminals usually have limited power, typical transmission output power levels, 10 microwatts-1 watt; and (2) the terminals have limited capability to wirelessly transmit data 1 kb/s-10 kb/s. Based on these facts, real-time mobile video transmission can only be achieved when a highly efficient compression algorithm with a very low implementation complexity can be implemented.
2. Description of the Prior Art
To compress motion pictures, a simple solution is to compress the picture on a frame-by-frame basis, for example, by means of the JPEG algorithm. This will result in a rather high bit rate although at low complexity. To achieve high compression efficiency, advanced video compression algorithms are needed. Up to now, different types of video compression algorithms have been developed. Typical examples include H.263-type block-based, 3D model-based, and segmentation based coding algorithms. Although based on different coding principles, these algorithms adopt a similar coding structure, namely a closed-loop coding structure. Such a unified coding structure is shown in FIG. 1. In this structure, the important blocks are image analysis, image synthesis, spatial encoder/decoder and modeling. Video compression algorithms are different in the image analysis and synthesis parts. Amongst the blocks, the most power consuming one is image analysis. The following are typical image analysis tasks performed by different algorithms:
H.263 coding algorithm. The main function of the image analysis part in the H.263 algorithm is to perform motion estimation. For motion estimation, commonly a block matching scheme is used. Unfortunately, motion estimation is very time- and power-consuming, although the motion model shown is very simple. For example, if a simple correlation measure is used, the sum of the absolute value of differences (SAD) is computed. For a full search over a +xe2x88x927 displacement range at QCIF (176xc3x97144 pixels) resolution and 10 frames/s, the SAD computation would have to be performed over 57 million times each second. With increased search range and a finer resolution of motion vectors, computation performance will be greatly increased.
Model-based coding algorithm. The task of image analysis here is to extract animation parameters based on models. The models can be given a priori or can be built during coding processing. Since the animation parameters are related to 3-dimensional information, e.g., 3D motion and shape are observable as 2d image information, a good strategy is needed. A powerful tool based on the analysis by synthesis principle (ABS) is shown in FIG. 2. Obviously, this is the second closed-loop appearing in the encoder. The purpose of using this loop is to aid or verify image analysis. This is done by means of comparing the image synthesis block with the reference image. As can be seen, image analysis is rather complicated. In contrast, image synthesis is much simpler. In addition, in the beginning of the session, the analysis part needs to perform initial computations, such as object location or identification, feature points localization, the fitting of a generic model to the specific object appearing in the scenes, and initial pose estimation. This type of work also gives rise to a very heavy computational load.
Segmentation-based coding algorithm. The task of image analysis here is to segment images into objects according to an assumed definition, to track the objects and to estimate their motion or shape parameters. The commonly used models are 2D planar objects with affine motion, and flexible 3D objects with 3D motion. Obviously, the estimation of the motion and shape parameters associated with these models requires sophisticated operations. In addition, another complex task carried out by the image analysis is to maintain, update and choose the models.
In summary, three important observations about the involvement of image analysis in advanced video compression algorithms are that: (1) the complexity of the encoder is much greater than that of the decoder; (2) computational loads in the beginning are usually heavier than that during the coding process, requiring that the encoder have a strong peak computational ability; and (3) the decoder must operate exactly as the encoder does, that is, the decoder is passively controlled by the encoder. These points make advanced video algorithms difficult to implement in low power terminals to achieve live video communication.
To overcome these problems, this invention makes it possible to achieve video communication with very low power terminals using video compression algorithms with the following features: (1) low complexity of the encoder and decoder; (2) high compression efficiency; and, (3) remote control of the operation of the encoder. This invention will make it possible to use current or future video compression algorithms to achieve video communication with low power terminals. Another object of this invention is to move the image analysis part to the receiver or an intermediate point in the network. This is illustrated in FIG. 3. In this invention, we distinguish two different concepts: the transmitter/receiver and encoder/decoder, which are mixed together in conventional video communication systems. In this invention, the encoder does not necessarily sit in the transmitter. Instead, a main part of the encoder remains in the transmitter while the computation consuming part, image analysis, is put in the receiver. In this way, image analysis is performed in the receiver instead of the transmitter. The generated animation and model data are then sent back to the transmitter to run normal encoding processing. Therefore, the encoder is still a virtually complete system but physically is distributed across both the transmitter and receiver. It is understood that to enable such communication the receiver must have sufficient power to perform complex image analysis, and the latency between the receiver and transmitter should be low. Alternatively, low latency may not be necessary if model-based or object-oriented coding schemes are used. The key is having sufficient power. For example, a high performance computer or server can be used to communicate with these low power terminals. In principle, this invention enables a supercomputer performance at a very low cost.
Furthermore, this invention is also supported by communication flexibility. According to the Shannon information theory, the channel capacity is determined by                     C        =                  B          ⁢                      xe2x80x83                    ⁢                      log            ⁡                          (                              1                +                                  S                  N                                            )                                                          (        1        )            
The increase in the signal power will increase the channel capacity. Since on the receiver side, sufficient power is available, it is no problem to transmit model and animation data to the transmitter.
In the case where the transmitter has certain computational ability and the latency is low, a second configuration scheme can be employed, which is illustrated in FIG. 4. In this scheme two analysis blocks are employed. These two analysis blocks are either { sf hierarchical}, e.g., the image analysis part in the receiver performs rough estimation of animation and model data while the one in the transmitter refines the results obtained from the receiver, or { sf separate}, e.g., the image analysis part in the transmitter is in charge of tracking while the one in the receiver is in charge of initial estimation.