One important objective of a video encoder is usually to achieve the best picture quality for a target bit-rate and/or the greatest data compression while retaining acceptable picture quality and frame rate. Unencoded (non-compressed) video data comprises data representing every picture element (known as a pixel) in every frame. Video encoders generally utilise blocks of pixels or coefficients that are known as macroblocks that typically comprise 16×16 or 8×8 pixels (but may be smaller, for example 4×4 or 2×2 or rectangular, for example 16×8 or 4×8).
Multi-frame video sequences almost always include data redundancy where much data is repeated or correlated; where pictures, or parts of pictures, remain the same or similar from one frame to another (i.e. where images remain stationary); where there is linear motion; and/or where there are low frequency planar colour areas for example. Rather than send the repeated data, video encoders use a variety of temporal and spatial compression algorithms to compress data that is subsequently sent to video decoders for decompression. Where a picture moves (e.g. via a pan or zoom) or where objects within a picture move (e.g. a moving vehicle), then for each non-stationary macroblock, techniques are used to estimate where the macroblock was positioned in previous frames. Such a technique is known as motion estimation. Video encoders can achieve data compression of moving image portions by, rather than sending the whole pixel data for a macroblock or sub-macroblock, sending a small amount of data that describes an estimated previous location of a macroblock or sub-macroblock (usually known as a motion vector), and a small amount of data representing the difference between the current macroblock and the macroblock at its previous location (the estimated previous location is found using a search algorithm). Persons skilled in the art will know that this data representing the difference is derived from encoding tools. Example encoding tools include inter prediction modes (temporal compression between frames) and intra prediction modes (spatial compression within the current frame).
A video encoder will typically have a number of system resources available to it. Example system resources include: data storage access bandwidth (the rate that the video encoder can access frames of unencoded video data, reference frames, instructions, and general data for example); operating clock frequency; processor time, available processing time; power availability; number of processors (if video encoder forms part of a multi-processor environment); encoded data storage space availability.
One or more system resources may vary at any time during the encoding process, for example: unencoded data storage access bandwidth may reduce when other processes and/or other processors access the data store for purposes other than as part of the video encode process; clock frequency may be reduced if a battery supplying power starts to become exhausted; processor time may reduce if the video encoder processor also has to perform other tasks (such as audio encoding); power availability may reduce when for example a battery supplying power starts to become exhausted; number of processors (if video encoder forms part of a multi-processor environment) may reduce when other processes need to run for purposes other than as part of the video encode process; encoded data storage space may start to become restricted when for example a storage device approaches capacity.
A video encoder will also often have to cater for variations in one or more picture characteristics. Example picture characteristics include: picture content complexity, picture content movement, picture size (number of pixels per frame), data rate requested, and/or frame rate (number of frames per second).
To achieve best encoded picture quality for each encoded macroblock; some video encoders calculate the cost of each of many intra prediction modes and each of many inter prediction modes. The encoder then picks the mode with the lowest cost to derive the data representing the difference between the current macroblock and the macroblock at a previous location or adjacent location (for intra modes). The cost calculation is usually derived from a function of differences of the pixel values, which may be the sum of absolute differences (SAD), mean squared error (MSE) or other function, together with a function of the bit cost. Encoding tools often used in addition, include in loop deblocking filters and sub-pixel motion estimation accuracy. However, this use of every tool for every macroblock can be time consuming needing high processing speeds and consume high amounts of power; this inflexible use of all tools all the time can lead to unnecessary power consumption and hardware cost.
Existing real-time software-based video encoders, that execute on a CPU-based (central processor) platform utilize system resources and cater for or account for certain variations in picture characteristics. In addition, these encoders have a fixed amount of time per macroblock to encode each new frame. Furthermore, existing real-time hardware-based state machine video encoders that operate a sequence of predetermined encoding operations also utilize system resources in the same manner.
Existing video encoders require that encoding operations have to be designed to cater or account for a set of worst-case system resources, and picture characteristics, for selected usage. Many of the existing encoders are designed to account for either specific changes in system resources, or specific changes in picture quality. The consequences are that improved encoded picture quality is unobtainable when system resources increase, and that if system resources decrease beyond the level designed to be catered for, then the video encoder will run out of time to encode frames, and will have to drop frames leading to visible motion judder or jitter. Similar consequences occur as a result of changes in picture characteristics.
One problem caused by the inflexibility of existing video encoders is that each variation in system resources or picture characteristics requires a different video encoder operating mode. However; this problem requires significant additional overhead within the systems, leading to increased complexity, cost and/or power consumption.