The present application relates to digital signal processing, more particularly to video compression, encoding, decoding, filtering and optimal decoding and denoising, and the related devices and computer software programs thereof.
Note that the points discussed below may reflect the hindsight gained from the disclosed inventions, and are not necessarily admitted to be prior art.
Digital data processing is growingly intimately involved in people's daily life. Due to the space and bandwidth limitations in many applications, raw multimedia data often need to be encoded and compressed for digital transmission or storage while preserving good qualities.
For example, a video camera's analogue-to-digital converter (ADC) converts its analogue signals into digital signals, which are then passed through a video compressor for digital transmission or storage. A receiving device runs the signal through a video decompressor, then a digital-to-analogue converter (DAC) for analogue display. For example, MPEG-2 standard encoding/decoding can compress ˜2 hours of video data by 15 to 30 times while still producing a picture quality that is generally considered high quality for standard-definition video.
Most encoding processes are lossy in order to make the compressed files small enough to be readily transmitted across networks and stored on relatively expensive media although lossless processes for the purposes of increase in quality are also available.
A common measure of objective visual quality is to calculate the peak signal-to-noise ratio (PSNR), defined as:
  PSNR  =      20    ⁢                  ⁢    log    ⁢                  ⁢          255                        [                                    1              MNT                        ⁢                                          ∑                                  t                  =                  1                                T                            ⁢                                                ∑                                      i                    =                    1                                    M                                ⁢                                                      ∑                                          j                      =                      1                                        N                                    ⁢                                                            [                                                                        f                          ⁡                                                      (                                                          i                              ,                              j                              ,                              t                                                        )                                                                          -                                                                              f                            ^                                                    ⁡                                                      (                                                          i                              ,                              j                              ,                              t                                                        )                                                                                              ]                                        2                                                                                ]                          1          2                    
Where ƒ(i,j,t) is the pixel at location (i,j) in frame t of the original video sequence, {circumflex over (ƒ)}(i, j, t) is the co-located pixel in the decoded video sequence (at location (i,j) in frame t). M and N are frame width and height (in pixels), respectively. T is the total number of frames in the video sequence. Typically, the higher the PSNR, the higher visual quality is.
Current video coding schemes utilize motion estimation (ME), discrete cosine transform (DCT)-based transform and entropy coding to exploit temporal, spatial and data redundancy. Most of them conform to existing standards, such as the ISO/IEC MPEG-1, MPEG-2, and MPEG-4 standards, the ITU-T, H.261, H.263, and H.264 standards, and China's audio video standards (AVS) etc.
The ISO/IEC MPEG-1 and MPEG-2 standards are used extensively by the entertainment industry to distribute movies, in applications such as video compact disk or VCD (MPEG-1), digital video disk or digital versatile disk or DVD (MPEG-2), recordable DVD (MPEG-2), digital video broadcast or DVB (MPEG-2), video-on-demand or VOD (MPEG-2), high definition television or HDTV in the US (MPEG-2), etc. The later developed MPEG-4 is better in some aspects than MPEG-2, and can achieve high quality video at lower bit rate, making it very suitable for video streaming over internet digital wireless network (e.g. 3G network), multimedia messaging service (MMS standard from 3GPP), etc. MPEG-4 is accepted into the next generation high definition DVD (HD-DVD) standard and the multimedia messaging standard (MMS).
The ITU-T H.261/3/4 standards are widely used for low-delay video phone and video conferencing systems. The latest H.264 (also called MPEG-4 Version 10, or MPEG-4 AVC) is currently the state-of-the-art video compression standard. H.264 is a joint development of MPEG with ITU-T in the framework of the Joint Video Team (JVT), which is also called MPEG-4 Advance Video Coding (MPEG-4 AVC), or MPEG-4 Version 10 in the ISO/IEC standards. H.264 has been adopted in the HD-DVD standard, Direct Video Broadcast (DVB) standard and MMS standard, etc. China's current Audio Video Standard (AVS) is also based on H.264 where AVS 1.0 is designed for high definition television (HDTV) and AVS-M is designed for mobile applications.
H.264 has superior objective and subjective video quality over MPEG-1/2/4 and H.261/3. The basic encoding algorithm of H.264 is similar to H.263 or MPEG-4 except that integer 4×4 discrete cosine transform (DCT) is used instead of the traditional 8×8 DCT and there are additional features including intra prediction mode for I-frames, multiple block sizes and multiple reference frames for motion estimation/compensation, quarter pixel accuracy for motion estimation, in-loop deblocking filter, context adaptive binary arithmetic coding, etc.
Despite the great effort in producing higher quality digital signals, data processing itself introduces distortion or artifacts in the signals; digital video sequences are almost always corrupted by noise due to video acquisition, recording, processing and transmission. A main source of noise is the noise introduced by capture device (e.g. the camera sensors), especially when the scene is dark leading to low signal-to-noise ratio. If an encoded video is a noise-corrupted video, the decoded video will also be noisy, visually unpleasing to the audience.