1. Field of the Invention
The invention relates to encoding methods and apparatus for encoding framed data such as video encoding methods, more in particular to rate control in low cost and large-scale implementation of video coding systems.
2. Description of the Related Technology
Data as often transmitted through data networks in the form of a sequence or stream of packets or frames, i.e., a sequence of discretised data bundles, each bundle having a specific data content. One well-known example of such data streams is a sequence of video frames. These data streams may be encoded especially for reduction in the data rate of the stream-as-transmitted. The reduction of data rate is often necessary in order to reduce the bandwidth of the transmission channel used to transmit the data. Generally, these streams must be encoded and decoded in real-time. This places limitations on the amount of memory and on the processing capacity of the devices used to encode and decode.
A video information stream comprises of a time sequence of video frames. Said time sequence of video frames can be recorded for instance by a video camera/recorder or may be sent from memory or created artificially or synthetically. Each of said video frames can be considered as a still image. Said video frames are represented in a digital system as an array of pixels. Each pixel may be defined by a set of characteristics of the data in the pixel, e.g. each pixel may comprise luminance or light intensity and chrominance or color information. For a recent review of luminance and chrominance see “Colour Image Processing” by Sanwine, Electronics & Communication Journal, vol. 12, No. 5, October 2000, pages 211 to 219.
The information associated with each pixel is stored in a memory of said digital system. For each pixel some bits are reserved. From a programming point of view each video frame can be considered as a two-dimensional data type, although said video frames are not necessary rectangular. Note that fields from an interlaced video time sequence can also be considered as video frames.
In principle when said video information stream must be transmitted between two digital systems, this can be realized by sending the video frames sequentially in time, for instance by sending the pixels of said video frames and thus the bits representing said pixels sequentially in time over a transmission channel.
There exist however more elaborated transmission schemes enabling faster and more reliable communication between two digital systems. Said transmission schemes are based on encoding said video information stream in the transmitting digital system, transmitting said encoded video information stream over a transmission channel and decoding the encoded video information stream in the receiving digital system. Note that the same principles can be exploited for the transmission and storage of data, e.g. to memory or bulk or permanent storage. There is no limit on the types of transmission channel, that is it can comprise a transmission channel of a Local Area Network, either wired or wireless, a Wide Area Network such as the Internet, the air interface of a cellular telephone system, etc.
During encoding the original video information stream is transformed into another digital representation. Said digital representation is then transmitted. While decoding the original video information stream is reconstructed from said digital representation.
For example, the MPEG-4 standard defines such an efficient encoded digital representation of a video information stream suitable for transmission and/or storage. The test model TMN8 of H263 and the verification model of MPEG-4 show that they rely at least on a measure of the prediction error activity of a whole frame. Embodiments exist wherein these models exploit measures of local macroblock (MB) activity.
Encoding requires operations on the video information stream. Said operations are performed on a digital system (for instance in said transmitting digital system). Such processing is often called Digital Signal processing (DSP). Each operation performed by a digital system consumes power. The way in which said operations for encoding are performed is called a method. Said methods have some characteristics such as encoding speed and the overall power consumption needed for encoding.
Said digital system can be implemented in a variety of ways, e.g. an application-specific hardware such as an accelerator board for insertion in a personal computer or a programmable processor architecture. It is well-known that most power consumption in said digital systems, while performing real-time multi-dimensional signal processing such as video stream encoding on said digital systems, is due to the memory units in said digital systems and the communication path between said memory units. More precisely individual read and write operations from and to memory units by processors and/or datapaths and between memories become more power expensive when said memory units are larger, and so does the access time or latency from the busses. Naturally also the amount of read and write operations are determining the overall power consumption and the bus loading. The larger the communication path the larger is also the power consumption for a data transfer operation. With communication is meant here the communication between memory units and the processors and data paths found in said digital system and between memories themselves. There is also a difference between on- and off-chip memories. Note that the same considerations are valid when considering speed as a performance criterion.
As the power consumption of said digital system is dominated by read and write operations, thus manipulations on data types and data structures, such as video frames, said methods are considered to be data-dominated.
As the algorithm specification, the algorithm choice and its implementation determine the amount of operations and the required memory sizes it is clear that these have a big impact on the overall power consumption and other performance criteria such as speed and bus loading.
A method for encoding a video information stream, resulting in a minimal power consumption of the digital system on which the method is implemented, and exhibiting excellent performance, e.g. being fast, must be based on optimized data storage, related to memory sizes, and data transfer, related to the amount of read and write operations.
The channel between said transmitting and said receiving device always has a certain and usually a limited bandwidth. The amount of bits that can be transmitted per time unit is upper-bounded by the bandwidth available for the transmission. This available bandwidth may be time dependent depending upon network loads. An encoding method which is inefficient or which is not adaptable may result in data being lost or discarded or, at best, delayed. An encoding method should be capable of dealing with such channel limitations by adapting its encoding performance in some way, such that fewer bits are transmitted when channel limitations are enforced. Said encoding method adaptation capabilities should again be power consumption and speed efficient. Performing encoding steps which, due to channel bandwidth adaptations or other limitations become useless and are thus unnecessary, should be avoided. Note that said encoding method adaptation capabilities should be such that the quality of the transmitted data should be preserved as much as possible. Minimum Quality of Service (QoS) requirements should be maintained.
Naturally when such a power consumption and speed optimal encoding method exists it can be implemented on a digital system, adapted for said method. This adaptation can be done by an efficient programming of programmable (application specific) processor architectures or by designing and fabricating an application-specific or domain-specific processor with the appropriate memory units. This can be a stand-alone unit or may be included within a larger processing structure such as a computer.
Prior art encoding methods with adaptation capabilities take into account channel bandwidth limitations by adapting some encoding parameters based on predictions of the bit rate needed, said predictions being based on historic data of said bit rate only. Said bit rate predictions do not take into account a characterization of the current video frame to be encoded. Said prior art encoding method are not using a relation, also denoted model, relating said bit rate, characteristics of the to-be-encoded-video-frame and said encoding parameters [Tihao Chiang and Ya-Qin Zhang, “A New Rate Control Scheme Using Quadratic Rate Distortion Model”, IEEE Trans. on Circuits and Systems for Video Technology, vol. 7, No. 1, pp. 246-250, February 1997.], [Wei Ding, and Bede Liu, “Rate Control of MPEG Video Coding and Recording by Rate-Quantization Modeling”, IEEE Trans. on Circuits and Systems for Video Technology, vol. 6, No. 1, pp. 12-20, February 1996.].
Prior art encoding methods with good quality preserving properties having adaptation capabilities, taking into account channel bandwidth limitations by adapting some encoding parameters, e.g. by taking into account a characterization of the video frame to be encoded, have severe drawbacks from the implementational point of view, [Jiann-Jone-Chen, and Hsueh-Ming-Hang, “Source model for transform video coder and its application. II. Variable frame coding.”, IEEE Trans. on Circuits and Systems for Video Technology, vol. 7, No. 2, pp. 299-311, April 1997.], [Jordi Ribas-Corbera, and Shawmin Lei, “Rate Control in DCT Video Coding for Low-Delay Communications”, IEEE Trans. on Circuits and Systems for Video Technology, vol. 9, No. 1, pp. 172-185, February 1999], [Anthony Vetro, Huifang Sun, and Yao Wang, “MPEG-4 Rate Control for Multiple Video Objects”, IEEE Trans. on Circuits and Systems for Video Technology, vol. 9, No. 1, pp. 920-924, February 1999.].
Where the adaptation scheme does work correctly, e.g. it generates a data rate which cannot be transmitted, the system generally only has two options: discard the excess data or stop the processing. The latter is often impossible or undesirable as real-time transmission is required. The former solves the problem with data loss which has to be compensated by other techniques, e.g. regeneration of data by interpolation between frames.
FIG. 8A shows a schematic representation of a prior art encoding scheme with a first encoding step (10) and a second encoding step (20) for encoding a video frame (320) on a time axis (300) with respect to a reference video frame (310). Said encoded current video frame (320) is transmitted via a bandwidth limited channel (60), being preceded with some buffering means (30). Potentially some video frame discarding means (50) are present in between said encoding steps (10) and (20) or before said first encoding step (70). Said first encoding step is executed in a block-based way. (220) represents the block loop, meaning that essentially all blocks are first sub-encoded before said first sub-encoding step is finished with said current video frame and the method moves on to a second sub-encoding step. Said second sub-encoding step can be executed in a similar fashion but with a different loop. Said prior-art method adapts the bit rate, taking into account possible buffer information (100), information about the complexity of the first sub-encoded video frame (140) by either adapting parameters of said second sub-encoding (120) or by discarding said current video frame (150). A decision circuit (40) takes this adaptation decision. Said first sub-encoding step possibly comprises transformation (motion) estimation and transformation (motion) compensation steps (11) and (12). Note that discarding (70) based on buffer (30) fullness information (170) only before first sub-encoding is also often used. No information on video frame complexity is used. An encoder of the above type is known from U.S. Pat. No. 5,969,764, which is incorporated herein by reference.
There still remains a requirement to improve the efficiency of encoding methods and apparatus for streams of framed data such as video frames. In particular there is a need for improved adaptive encoding methods and apparatus for framed data sequences.