Digital communication and storage of image data is a difficult task due to the sheer volume of digital data required to accurately describe a single frame of an image. In video, the amount of data quickly becomes very large. Image coding seeks to make the communication and/or storage of image data manageable by compressing image data, i.e., reducing the amount of data necessary to represent an image. Communication resources, for example, have limited bandwidth. This is especially true in wireless communication media. There are tradeoffs in image data coding. Reducing the size of the data should not, for example, degrade the image quality beyond an acceptable metric. Also, the computational cost and speed must be managed, especially in devices where computational resources and power resources are to be conserved. Modem examples of video encoding/compression approaches include MPEG-4 and H.264. In particular, the latter has been specifically designed for video transmission over packet networks.
Many devices have access to more than one communication medium. A device, like a laptop, personal digital assistant (PDA), workstation, or a video conferencing system may have access to multiple networks. For example, one device may have access to several different types of wired and wireless networks.
Many video compression algorithms today make use of motion compensation to achieve substantial compression. The basic idea of motion compensation is as follows. A macroblock denotes a block of image data, e.g., a square region of 16 by 16 pixels in an image. A macroblock in the current frame to be encoded is compared against some set of macroblocks in a reference frame to find the one that is the most similar. The reference frame is typically the previous frame in the image. Similarity is usually measured by the sum of the absolute values of the pixel differences, or by the squared difference between pixels. The location of this best match block can be specified by giving an offset vector, called a motion vector, which describes the horizontal and vertical positional difference between the current macroblock to be encoded and the best match macroblock in the reference frame. The current macroblock to be encoded can perhaps be represented only using this motion vector. The decoder, upon receiving this motion vector, can take the referenced block from the reference frame and paste it into place for representing the current block. If the referenced block and the current block are similar enough, this direct substitution might provide adequate quality. If they are not close enough, the encoder can optionally send along some additional information which describes how to modify the referenced block so as to make it more similar to the current block.
In either case, this is referred to as INTER coding. When the encoder finds no good match to the current macroblock, it might choose to encode the current macroblock all by itself, without reference to any other past block. This is referred to as INTRA coding. Choosing between INTER and INTRA coding is the basic approach found in the video coding standards MPEG, MPEG-2, MPEG-4 [T. Sikora, “The MPEG-4 Video Standard Verification Model,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 7, no. 1, pp. 19-31, February 1997.], H.263 [G. Cote, B. Erol, M. Gallant and F. Kossentini, “H.263+: Video Coding at Low Bit Rates,” IEEE Trans. Circ. and Systems for Video Techn, vol. 8, no. 7, pp. 849-865, November 1998.] and the latest and state-of-the-art H.264.
INTER coding tends to require fewer bits than INTRA coding, but can propagate errors. INTRA coding, since it does not make reference to a previous frame, cannot propagate an error present in a previous frame. Choosing between INTER and INTRA coding involves meeting competing goals of using fewer bits and being robust to errors.
Making an intelligent choice between INTER and INTRA coding is an issue in the art. One paper dealing with this subject is R. Zhang, S. L. Regunathan, and K. Rose, “Video Coding with Optimal Inter/Intra-Mode Switching for Packet Loss Resilience,” IEEE Journal on Selected Areas in Communications, vol. 18, no. 6 pp. 966-76, June 2000. This paper provides a distortion estimation method called ROPE (recursive optimal per pixel estimate) which accounts for two factors in estimating the distortion: the channel error probability and the concealability of the block being encoded. The choice between INTER and INTRA coding for a given block (such as a macroblock) is then made by balancing off the competing goals of reducing the distortion (as estimated by ROPE) and using only a small number of bits for encoding (in particular, staying within the target rate constraint). Rate constraints favor INTER coding, error constraints favor INTRA coding, and the ability to conceal errors favors INTER coding.
When a connection used to transmit video data suffers a change in quality, the resulting video decoding may produce very poor results. When the reference frame provides a poor quality reference, the decoding result declines rapidly. One technique to address this that has been proposed is to retain multiple frames. However, this can make the encoding burden and complexity very high.
There are many examples of the multiple reference frame approach. [See, e.g., N. Vasconcelos and A. Lippman, “Library-based Image Coding,” IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. v, pp. V/489-V/492, 1994; T. Wiegand, X. Zhang, and B. Girod, “Long term Memory Motion-Compensated Prediction,” IEEE Trans. Circ. and Systems for Video Techn., vol. 9, no. 1, pp. 70-84, February 1999.]. In one example, when coding a block in frame N of a video, the encoder might look for the best possible matching block in frames N−1, N−2, N−3, and N−4. That is, the 4 immediate past frames could be searched for a match. The encoder could then tell the decoder which reference frame provided the best match. For example, 2 bits can be assigned to describe which of the 4 frames provides the best match and then the usual motion vector is provided to give the offset between the current block to be encoded and the location of the best match block in the specified reference frame.
In T. Fukuhara, K. Asai, and T. Murakami, “Very Low Bit-Rate Video Coding with Block Partitioning and Adaptive Selection of Two Time-Differential Frame Memories,” IEEE Trans. Circ. and Systems for Video Techn., vol. 7, no. 1, pp. 212-220, February 1997, only two time-differential frames were used, thus requiring a relatively modest increase in computational complexity. This dual frame buffer is a special case of multiple frame buffers in which there are only two reference frames. For example, there could be one short term reference frame (the immediate past frame) and one long term reference frame (a frame from the more distant past). In Fukuhara et al., one frame was the previous one, as in many hybrid codecs, and the second one contained a reference frame from the more distant past that was periodically updated according to a predefined rule. It has been shown that multiple reference frames can yield a significant gain in reconstructed PSNR (Peak Signal-to-Noise Ratio, at the expense of increased computational burden and memory complexity. Motion estimation is the main performance bottleneck in a hybrid video coding system, and can account for more than 80-90% of the total encoding time. Thus, adding even one additional frame buffer can double the encoding time. The same is true with memory requirements, where the increase is also linear and thus prohibitive as the number of reference frames grows large.
An always best connected (ABC) approach is an approach used where a device has access to multiple connections. A device such as a laptop or a PDA might be capable of accessing several different types of wireless or wired networks which operate at different rates. For example, the device may be able to communicate using an Ethernet connection (10 Mbps), Wireless LAN (11 Mbps), HDR (400-500 Kbps), 1×RTT (64 kbps), and GPRS (16 kbps). At any given time, the device would operate using the best connection it can access at that particular time provided the user does not choose any other network in his user profile. The best connection would often be the one with highest data rate, but other factors are involved as well (e.g., error rate, delay, etc.). If the best connection becomes unavailable, or it deteriorates to the point where it is no longer the best connection, the device would be expected to switch seamlessly to some other connection, the new best one. Often, this would be a connection at a lower rate. The device would also be expected to probe all connections periodically, to see which ones are available. If a high rate connection becomes unavailable and then becomes available again, the device would be expected to discover the availability, and switch back to using that network.