A variety of systems have been developed for the encoding and decoding of audio/video data for transmission over wireline and/or wireless communication systems over the past decade. Most systems in this category employ standard compression/transmission techniques, such as, for example, the ITU-T Rec. H.264 (also referred to as H.264) and ISO/IEC Rec. 14496-10 AVC (also referred to as MPEG-4) standards. However, due to their inherent generality, they lack the specific qualities needed for seamless implementation on low power, low complexity systems (such as hand held devices including, but not restricted to, personal digital assistants and smart phones) over noisy, low bit rate wireless channels.
Due to the likely business models rapidly emerging in the wireless market, in which cost incurred by the consumer is directly proportional to the actual volume of transmitted data, and also due to the limited bandwidth, processing capability, storage capacity and battery power, efficiency and speed in compression of audio/video data to be transmitted is a major factor in the eventual success of any such multimedia content delivery system. Most systems in use today are retrofitted versions of identical systems used on higher end desktop workstations. Unlike desktop systems, where error control is not a critical issue due to the inherent reliability of cable LAN/WAN data transmission, and bandwidth may be assumed to be almost unlimited, transmission over limited capacity wireless networks require integration of such systems that may leverage suitable processing and error-control technologies to achieve the level of fidelity expected of a commercially viable multimedia compression and transmission system.
Conventional video compression engines, or codecs, can be broadly classified into two broad categories. One class of coding strategies, known as a download-and-play (D&P) profile, not only requires the entire file to be downloaded onto the local memory before playback, leading to a large latency time (depending on the available bandwidth and the actual file size), but also makes stringent demands on the amount of buffer memory to be made available for the downloaded payload. Even with the more sophisticated streaming profile, the current physical limitations on current generation transmission equipment at the physical layer force service providers to incorporate a pseudo-streaming capability, which requires an initial period of latency (at the beginning of transmission), and continuous buffering henceforth, which imposes a strain on the limited processing capabilities of the hand-held processor. Most commercial compression solutions in the market today do not possess a progressive transmission capability, which means that transmission is possible only until the last integral frame, packet or bit before bandwidth drops below the minimum threshold. In case of video codecs, if the connection breaks before the transmission of the current frame, this frame is lost forever.
Another drawback in conventional video compression codes is the introduction of blocking artifacts due to the block-based coding schemes used in most codecs. Apart from the degradation in subjective visual quality, such systems suffer from poor performance due to bottlenecks introduced by the additional de-blocking filters. Yet another drawback is that, due to the limitations in the word size of the computing platform, the coded coefficients are truncated to an approximate value. This is especially prominent along object boundaries, where Gibbs' phenomenon leads to the generation of a visual phenomenon known as mosquito noise. Due to this, the blurring along the object boundaries becomes more prominent, leading to degradation in overall frame quality.
Additionally, the local nature of motion prediction in some codes introduces motion-induced artifacts, which cannot be easily smoothened by a simple filtering operation. Such problems arise especially in cases of fast motion clips and systems where the frame rate is below that of natural video (e.g., 25 or 30 fps non-interlaced video). In either case, the temporal redundancy between two consecutive frames is extremely low (since much of the motion is lost in between the frames itself), leading to poorer tracking of the motion across frames. This effect is cumulative in nature, especially for a longer group of frames (GoF).
Furthermore, mobile end-user devices are constrained by low processing power and storage capacity. Due to the limitations on the silicon footprint, most mobile and hand-held systems in the market have to time-share the resources of the central processing unit (microcontroller or RISC/CISC processor) to perform all its DSP, control and communication tasks, with little or no provisions for a dedicated processor to take the video/audio processing load off the central processor. Moreover, most general-purpose central processors lack the unique architecture needed for optimal DSP performance. Therefore, a mobile video-codec design must have minimal client-end complexity while maintaining consistency on the efficiency and robustness front.