The term “codec” refers to either “compressor/decompressor”, “coder/decoder”, or “compression/decompression algorithm”, which describes a device or algorithm, or specialized computer program, capable of performing transformations on a data stream or signal. Codecs encode a data stream or signal for transmission, storage or encryption and decode it for viewing or editing. For example, a digital video camera converts analog signals into digital signals, which are then passed through a video compressor for digital transmission or storage. A receiving device then decompresses the received signal via a video decompressor, and the decompressed digital signal is converted to an analog signal for display. A similar process can be performed on audio signals. There are numerous standard codec schemes. Some are used mainly to minimize file transfer time, and are employed on the Internet. Others are intended to maximize the data that can be stored in a given amount of disk space, or on a CD-ROM. Each codec scheme may be handled by different programs, processes, or hardware.
A digital image is a representation of a two-dimensional image as a finite set of digital values, called picture elements or pixels. Typically, pixels are stored in computer memory as a raster image or raster map, which is a two-dimensional array of integers. These values are often transmitted or stored in a compressed form.
Digital images can be created by a variety of input devices and techniques, such as digital cameras and camcorders, scanners, coordinate-measuring machines, seismographic profiling, airborne radar, and more. They can also be synthesized from arbitrary non-image data, such as mathematical functions or three-dimensional geometric models; the latter being a major sub-area of computer graphics. The field of digital image processing is the study or use of algorithms to perform image processing on digital images. Image codecs include such algorithms to perform digital image processing.
Different image codecs are utilized to see the image depending on the image format. The GIF, JPEG and PNG images can be seen simply using a web browser because they are the standard internet image formats. The SVG format is now widely used in the web and is a standard W3C format. Other programs offer a slideshow utility, to see the images in a certain order one after the other automatically.
Still images have different characteristics than video. For example, the aspect ratios and the colors are different. As such, still images are processed differently than video, thereby requiring a still image codec for still images and a video codec, different from the still image codec, for video.
A video codec is a device or software module that enables the use of data compression techniques for digital video data. A video sequence consists of a number of pictures (digital images), usually called frames. Subsequent frames are very similar, thus containing a lot of redundancy from one frame to the next. Before being efficiently transmitted over a channel or stored in memory, video data is compressed to conserve both bandwidth and memory. The goal of video compression is to remove the redundancy between frames to gain better compression ratios. There is a complex balance between the video quality, the quantity of the data needed to represent it (also known as the bit rate), the complexity of the encoding and decoding algorithms, their robustness to data losses and errors, ease of editing, random access, end-to-end delay, and a number of other factors.
A typical digital video codec design starts with the conversion of input video from a RGB color format to a YCbCr color format, and often followed by chroma sub-sampling to produce a sampling grid pattern. Conversion to the YCbCr color format improves compressibility by de-correlating the color signals, and separating the perceptually more important luma signal from the perceptually less important chroma signal, and which can be represented at lower resolution.
Some amount of spatial and temporal down-sampling may also be used to reduce the raw data rate before the basic encoding process. Down-sampling is the process of reducing the sampling rate of a signal. This is usually done to reduce the data rate or the size of the data. The down-sampling factor is typically an integer or a rational fraction greater than unity. This data is then transformed using a frequency transform to further de-correlate the spatial data. One such transform is a discrete cosine transform (DCT). The output of the transform is then quantized and entropy encoding is applied to the quantized values. Some encoders can compress the video in a multiple step process called n-pass encoding, for example 2-pass, which is generally a slower process, but potentially provides better quality compression.
The decoding process consists of essentially performing an inversion of each stage of the encoding process. The one stage that cannot be exactly inverted is the quantization stage. There, a best-effort approximation of inversion is performed. This part of the process is often called “inverse quantization” or “dequantization”, although quantization is an inherently non-invertible process.
A variety of codecs can be easily implemented on PCs and in consumer electronics equipment. Multiple codecs are often available in the same product, avoiding the need to choose a single dominant codec for compatibility reasons.
Some widely-used video codecs include, but are not limited to, H.261, MPEG-1 Part 2, MPEG-2 Part 2, H.263, MPEG-4 Part 2, MPEG-4 Part 10/AVC, DivX, XviD, 3ivx, Sorenson 3, and Windows Media Video (MWV).
MPEG codecs are used for the generic coding of moving pictures and associated audio. MPEG video codecs create a compressed video bit-stream traditionally made up of a series of three types of encoded data frames. The three types of data frames are referred to as an intra frame (called an I-frame or I-picture), a bi-directional predicated frame (called a B-frame or B-picture), and a forward predicted frame (called a P-frame or P-picture). These three types of frames can be arranged in a specified order called the GOP (Group Of Pictures) structure. I-frames contain all the information needed to reconstruct a picture. The I-frame is encoded as a normal image without motion compensation. On the other hand, P-frames use information from previous frames and B-frames use information from previous frames, a subsequent frame, or both to reconstruct a picture. Specifically, P-frames are predicted from a preceding I-frame or the immediately preceding P-frame.
Frames can also be predicted from the immediate subsequent frame. In order for the subsequent frame to be utilized in this way, the subsequent frame must be encoded before the predicted frame. Thus, the encoding order does not necessarily match the real frame display order. Such frames are usually predicted from two directions, for example from the I- or P-frames that immediately precede or the P-frame that immediately follows the predicted frame. These bidirectionally predicted frames are called B-frames.
B-frames and P-frames require fewer bits to store picture data, as they generally contain difference bits for the difference between the current frame and a previous frame, subsequent frame, or both. B-frames and P-frames are thus used to reduce the redundant information contained across frames. A decoder in operation receives an encoded B-frame or encoded P-frame and uses a previous or subsequent frame to reconstruct the original frame. This process is much easier than reconstructing each original frame independently and produces smoother scene transitions when sequential frames are substantially similar, since the difference in the frames is small.
Each video image is separated into one luminance (Y) and two chrominance channels (also called color difference signals Cb and Cr). Blocks of the luminance and chrominance arrays are organized into “macroblocks,” which are the basic unit of coding within a frame.
In the case of I-frames, the actual image data is passed through an encoding process. However, P-frames and B-frames are first subjected to a process of “motion compensation.” Motion compensation is a way of describing the difference between consecutive frames in terms of where each macroblock of the former frame has moved. Such a technique is often employed to reduce temporal redundancy of a video sequence for video compression. Each macroblock in the P-frame or B-frame is associated with an area in the previous or next image that it is well-correlated with, as selected by the encoder using a “motion vector” that is obtained by a process termed “Motion Estimation.” The motion vector that maps the current macroblock to its correlated area in the reference frame is encoded, and then the difference between the two areas is passed through the encoding process.
Conventional video codecs use motion compensated prediction to efficiently encode a raw input video stream. The macroblock in the current frame is predicted from a displaced macroblock in the previous frame. The difference between the original macroblock and its prediction is compressed and transmitted along with the displacement (motion) vectors. This technique is referred to as inter-coding, which is the approach used in the MPEG standards.
Many conventional imaging devices are configured to capture high resolution still images in addition to video. Such devices are also known as hybrid camera/camcorders. In such devices, a still image is captured when the photographer presses a capture button on the device. Each captured still image corresponds to a specific video frame captured at the same instant in time as the still image, if the video was also captured in parallel.
When using imaging devices, such as cameras and camcorders, a frequent problem that a user encounters is known as “Shutter Lag”. This term is generally defined as a delay that occurs between the photographer pressing the capture button and the shutter actually opening to capture the desired event of interest. Shutter lag can also be caused due to mechanical delay in the imaging device itself. This is a very common problem experienced by users especially in the photography of fast moving objects. To overcome this problem, users are required to press the capture button before the event actually takes place, with some forethought. This delay, is highly variable and depends on many factors such as device type, amount of motion in scene, camera settings, etc. Due to this delay, the user on most accounts tends to miss the actual scene of interest, referred to herein as a “Moment of Interest” (MOI).
In hybrid devices including both a still image and a video capture function, a conventional approach to the shutter lag problem is to up-sample the base-layer video frame corresponding to the missed moment of interest A positive aspect of this technique is its simplicity. An inherent drawback lies in the up-sampling artifacts that are very clearly visible in the up-sampled high-resolution picture. As such, this approach may be suitable for preview, etc, but is not a good technique to create high quality pictures for printing or other applications.