There are a number of video superresolution and video transmission systems individually in the prior art, but there has not been an integrated system of real-time superresolution and advanced video coding and transmission. The main innovation is to integrate key elements of motion registration, non-uniform image interpolation, image regularization, and image post-processing, along with highly compressed video transmission, in a unique form that directly produces, in real-time, high resolution videos at the decoder side using low resolution encoded videos. The invention also relates to video coding using international video coding standards such as those from the International Standards Organization (MPEG-1, MPEG-2, MPEG-4) and the International Telecommunications Union (H.263, H.264 and the upcoming H.265), in particular the current standard, H.264/AVC for video. This invention further relates to video file and stream wrappers such as the MPEG-2 transport stream and the MP4 file format, which can contain useful auxiliary information. This invention further relates to the video quality enhancement for a wide variety of surveillance and reconnaissance systems, and even consumer electronics products, such as handheld digital cameras. For illustrative purposes, we often focus on specific resolutions of VGA and QVGA, and the H.264 video standard, but the invention is substantially broader.
In many imaging applications high resolution (HR) images are desired for further processing. High resolution images are images with high pixel density. High resolution images offer more details compared to low resolution (LR) images. For example, high resolution images may be useful in medical diagnosis. Similarly, high resolution images are also desired in many defense applications like surveillance and in satellite imagery. Although high resolution images are desired ubiquitously it is not possible in most scenarios to sense HR images. This is due primarily to the usual size, weight, power and cost constraints. To overcome these limitations inherent in almost any sensing system, we try to extend the ranges and conditions under which a sensor can provide imagery by using signal processing techniques to enhance the resolution of an image from the sensed multiple low resolution images. Such a system will improve the speed for F2T2EA (Find, Fix, Track, Target, Engage, and Assess). This technique of enhancing LR images into HR images is called superresolution (SR). Superresolution images at extended ranges admit many different solutions, each equating to a different problem to solve.
The basic physical problems to overcome are the limits that physical geometry and optics of any camera system impose on the resolution performance, and the environmental factors such as turbulence, particulates, and humidity that contribute to degraded image quality. The advantage of the superresolution method is that it costs less compared to a HR hardware and the same LR hardware can be used for observations. It provides better image quality and resolution than a comparable imaging system of the same size, with the goal of exceeding the diffraction and seeing limits, and negating severe environmental effects.
A small description of the basic concept of superresolution will be provided in the following paragraphs. When there is a small motion between frames, the data in low resolution images can be used to fill the pixels in the desired high resolution image. The small motion between frames is referred to as the sub-pixel displacement as shown in FIG. 1. Since the imaging systems are discrete the pixels do not capture all the information. For example the information in the image between the pixels (◯) is averaged and hence lost. Instead this information could be captured in the other frames due to the sub-pixel displacement as shown in FIG. 1. This fact can be used to fuse the low-resolution frames to form a high resolution frame. The pixels (Δ) have a strict sub-pixel displacement whereas the pixels (+) have a pixel plus sub-pixel displacement. To reconstruct the HR images from the LR frames first the model which generates these LR images from the HR images is specified so that the reconstruction algorithms are designed to perform the inverse operation of this model. The model showing the generation of K LR images gk from HR image i is shown in FIG. 2, and given bygk=Wki+ηk  (1)where i and gk are the reformatted vectors and Wk models the generation process and ηk is a noise vector. 100 is an example of sub-pixel displacement. The grid of pixels (◯) is the reference image frame. The grids of pixels (Δ and +) with sub-pixel displacements are drawn with respect to the reference image frame.
FIG. 2 illustrates a model 200 generating LR images from the HR images. The motion block refers to the translation and rotation of the LR images. The blur is induced due to motion etc. The decimation block models the down-sampling which results in lower resolution.
The steps of superresolving the K LR images involve reverting each of the blocks in the above model. The motion block is accounted for by finding the registration values between the reference LR image and the K−1 non-reference frame LR images. After this the effects of down-sampling are accounted by populating the desired HR image with the registered LR images and then interpolating the remaining pixels. After this a de-blurring process is applied to remove the effects of the blur. Different methods can vary in these three stages of superresolution. These methods can be broadly divided into three categories.
The first category is the SR process, wherein the SR process is straightforward where the LR images are registered relative to the reference LR image. The HR grid is estimated by using the non-uniform interpolation method. The de-blurring process involves methods like Wiener filtering. One such method is described herein. In the second kind of process the interpolation step is replaced by estimating the HR image from the LR images in the frequency domain. In the third kind of process, the conversion to a HR image involves a regularization step which accounts for the lack of sufficient number of LR images and error prone blur models. These methods can again be divided into deterministic and stochastic reconstruction.
Gradient-Based Registration
In this section, we describe a prior art algorithm presented for superresolution. In this algorithm, the registration between non-reference LR frames and the reference LR frames is a modified version of other prior art algorithms. The registration algorithm disclosed herein is a gradient based registration algorithm but valid only for shifts of one LR pixel widths. Another registration disclosed herein is an iterative technique useful for larger shifts.
Let the reference LR image be g1(x,y). Then the non-reference LR images can be represented asgk(x,y)=g1(x+txk,y+tyk)  (2)where txk,tyk are the horizontal and vertical shifts which we need to estimate in this registration process. Considering the first three terms of the Taylor's series we get
                                          g            k                    ⁡                      (                          x              ,              y                        )                          =                                            g              1                        ⁡                          (                              x                ,                y                            )                                +                                    tx              k                        ⁢                                          ∂                                                      g                    1                                    ⁡                                      (                                          x                      ,                      y                                        )                                                                              ∂                x                                              +                                    ty              k                        ⁢                                          ∂                                                      g                    1                                    ⁡                                      (                                          x                      ,                      y                                        )                                                                              ∂                y                                                                        (        3        )            Since x, y are continuous variables, we approximate with discrete variables m, n and then to estimate txk,tyk by applying the least squares method. We apply least squares by minimizing the error term
                                          E            k                    ⁡                      (                                          tx                k                            ,                              ty                k                                      )                          ≈                  ∑                                    [                                                                    g                    k                                    ⁡                                      (                                          m                      ,                      n                                        )                                                  -                                                      g                    1                                    ⁡                                      (                                          m                      ,                      n                                        )                                                  -                                                      tx                    k                                    ⁢                                                            ∂                                                                        g                          1                                                ⁡                                                  (                                                      m                            ,                            n                                                    )                                                                                                            ∂                      m                                                                      -                                                      ty                    k                                    ⁢                                                            ∂                                                                        g                          1                                                ⁡                                                  (                                                      m                            ,                            n                                                    )                                                                                                            ∂                      n                                                                                  ]                        2                                              (        4        )            By differentiating with respect to txk,tyk we get M·S=V or S=M−1·V where S=[txk,tyk]T and
                    M        =                  [                                                                      ∑                                                            (                                                                        ∂                                                                                    g                              1                                                        ⁡                                                          (                                                              m                                ,                                n                                                            )                                                                                                                                ∂                          m                                                                    )                                        2                                                                                                ∑                                                                                    ∂                                                                              g                            1                                                    ⁡                                                      (                                                          m                              ,                              n                                                        )                                                                                                                      ∂                        m                                                              ⁢                                                                  ∂                                                                              g                            1                                                    ⁡                                                      (                                                          m                              ,                              n                                                        )                                                                                                                      ∂                        n                                                                                                                                                                  ∑                                                                                    ∂                                                                              g                            1                                                    ⁡                                                      (                                                          m                              ,                              n                                                        )                                                                                                                      ∂                        m                                                              ⁢                                                                  ∂                                                                              g                            1                                                    ⁡                                                      (                                                          m                              ,                              n                                                        )                                                                                                                      ∂                        n                                                                                                                                          ∑                                                            (                                                                        ∂                                                                                    g                              1                                                        ⁡                                                          (                                                              m                                ,                                n                                                            )                                                                                                                                ∂                          n                                                                    )                                        2                                                                                ]                                    (        5        )                        and                                                      V        =                  [                                                                      ∑                                                            (                                                                                                    g                            k                                                    ⁡                                                      (                                                          m                              ,                              n                                                        )                                                                          -                                                                              g                            1                                                    ⁡                                                      (                                                          m                              ,                              n                                                        )                                                                                              )                                        ⁢                                                                  ∂                                                                              g                            1                                                    ⁡                                                      (                                                          m                              ,                              n                                                        )                                                                                                                      ∂                        m                                                                                                                                                                  ∑                                                            (                                                                                                    g                            k                                                    ⁡                                                      (                                                          m                              ,                              n                                                        )                                                                          -                                                                              g                            1                                                    ⁡                                                      (                                                          m                              ,                              n                                                        )                                                                                              )                                        ⁢                                                                  ∂                                                                              g                            1                                                    ⁡                                                      (                                                          m                              ,                              n                                                        )                                                                                                                      ∂                        n                                                                                                                          ]                                    (        6        )            To account for the larger shifts the iterative techniques in the prior art are used. First the initial registration values are estimated. Then the LR image gk(x,y) is shifted by that amount and the same procedure is applied until the registration values are smaller than a specified value.Back Projection Algorithm
The basic idea of improving the high resolution image using back-projection is borrowed from computer aided tomography (CAT), where the X-ray beam moves all around the patient, scanning from hundreds of different angles and the computer takes all this information and puts together a 3-D image of the body. In superresolution, low resolution images are projections of the original scene after blurring and decimation. Herein, we use a model similar to that described by Farsui, which has a regularization term along with the gradient back-projection term. Let D, F and H denote decimation, warping and blurring operations, X and Y represent original and low resolution images. The high resolution image is iteratively estimated by optimizing the following equation
                              X          ^                =                                            arg              ⁢                                                          ⁢              min                        X                    [                                                    ∑                                  k                  =                  1                                N                            ⁢                                                                                                                                    D                        k                                            ⁢                                              H                        k                                            ⁢                                              F                        k                                            ⁢                      X                                        -                                          Y                      k                                                                                        1                                      +                          λ              ⁢                                                                    ∑                                          l                      =                                              -                        P                                                              P                                    ⁢                                      ∑                                          m                      =                      0                                        P                                                                    ︸                                                            l                      +                      m                                        ≥                    0                                                              ⁢                              α                                                                          m                                                        +                                                          l                                                                                  ⁢                                                                                      X                    -                                                                  S                        x                        l                                            ⁢                                              S                        y                        m                                            ⁢                      X                                                                                        1                                              ]                                    (        7        )            where Slx and Smy are matrices that shift the image X by l and m pixels in horizontal and vertical directions respectively. N is the number of low resolution frames considered to generate one high resolution frame. The first term represents the similarity cost and the second term is the regularization term. The scalar weight α(0<α<1) is applied to give a spatially decaying effect to the summation of the regularization terms. λ is the regularization factor. The solution to the above equation using steepest descent as given in the prior art is adopted here.
                                          X            ^                                n            +            1                          =                                            X              ^                        n                    -                      β            ⁢                          {                                                                    ∑                                          k                      =                      1                                        N                                    ⁢                                                            F                      k                      T                                        ⁢                                          H                      k                      T                                        ⁢                                          D                      k                      T                                        ⁢                                          sign                      ⁡                                              (                                                                                                            D                              k                                                        ⁢                                                          H                              k                                                        ⁢                                                          F                              k                                                        ⁢                                                                                          X                                ^                                                            n                                                                                -                                                      Y                            k                                                                          )                                                                                            +                                  λ                  ⁢                                                                                    ∑                                                  l                          =                                                      -                            P                                                                          P                                            ⁢                                              ∑                                                  m                          =                          0                                                P                                                                                    ︸                                                                        l                          +                          m                                                ≥                        0                                                                              ⁢                                                            α                                                                                                  m                                                                          +                                                                            l                                                                                                                ⁡                                          [                                              I                        -                                                                              S                            y                                                          -                              m                                                                                ⁢                                                      S                            x                                                          -                              l                                                                                                                          ]                                                        ⁢                                      sign                    ⁡                                          (                                                                                                    X                            ^                                                    n                                                -                                                                              S                            x                            l                                                    ⁢                                                      S                            y                            m                                                    ⁢                                                                                    X                              ^                                                        n                                                                                              )                                                                                  }                                                          (        8        )            where β is the scalar denoting the step size in the direction of gradient. Sx−l and Sy−m define the transpose of the matrices Slx and Smy respectively with opposite shifting directions.Partition Weighted Sum (PWS) Filtering
The key to successful motion estimation and compensation at the subpixel level is accurate interpolation. Standard interpolators (e.g., bilinear, bicubic, and spline) tend to smooth images and may not fully preserve the fine image structure. One promising nonlinear filtering technique, Partition Weighted Sum (PWS) filters, have recently been shown to be very effective in interpolation applications where resolution enhancement or preservation is critical. The PWS filter uses a moving window that spans a set of N samples and moves across the image in a raster scan fashion. At each image position the samples spanned by the window form a spatial observation vector, x. The PWS uses vector quantization (VQ) to partition the observation space and assign each observation vector into one of the M partitions. Associated with each partition is a finite impulse response (FIR) Wiener filter that is “tuned” for data falling into that partition using suitable training data. After an observation vector is classified, the corresponding filter is applied. Because the filter is spatially adaptive, it is well suited to handle nonstationarities in the signal and/or noise statistics. A block diagram of the prior art PWS filter structure is shown in FIG. 3.
FIG. 3 depicts a block diagram illustrating a prior art partition weighted sum filter. A moving window provides the samples in the vector x. Based on the pixel structure present in that particular window position, one set of filter weights is selected and applied, as indicated by the position of the selection arrow.
Note that wi is an N×1 vector of weights for partition i, and the partition function p(•): RN{1, 2, . . . , M} generates the partition assignment. Using VQ partitioning, x is compared with a codebook of representative vectors. An example of a codebook generated with a LBG algorithm is shown in FIG. 4. The index of the codeword closest in a Euclidean sense to the observation vector is selected as the partition index. The standard Wiener filter can be considered a special case of the PWS filter with only one partition. Because the filter is spatially adaptive, it is well suited to handle nonstationarities in the signal and/or noise statistics. Previous work has shown the effectiveness of the PWS filter in an image de-noising application. Recently this filter has been applied to image deconvolution, superresolution and demosaicing. For interpolation applications, the filter estimates missing grid points using a weighted sum of the present neighboring pixels. The weights depend on the VQ partition for the local region. Thus, unlike bilinear interpolation, for example, the PWS approach uses more neighboring pixels and weights them differently depending on the intensity structure (edge, line, texture, etc). This allows it to preserve detail that can be lost with other interpolators.
FIG. 4 is a vector quantization codebook for a prior art PWS filter. This codebook corresponds to an M=25 vector codebook for a 5×5 moving window filter. Notice how the codebook captures a variety of common structures including flat, edges, lines, etc. Thus, filters can be tuned for each type of structure in the PWS framework.
Other developments are motivated by recent results in sparse signal representation, which ensure that linear relationships among high-resolution signals can be precisely recovered from their low-dimensional projections. We now briefly summarize what has been done in other areas.
We try to infer the high-resolution patch for each low resolution patch from the input. For this local model, we have two dictionaries Dl and Dh:Dh is composed of high resolution patches and Dl is composed of corresponding low-resolution patches. We subtract the mean pixel value for each patch, so that the dictionary represents image textures rather than absolute intensities. For each input low-resolution patch y, we find a sparse representation with respect to Dl. The corresponding high resolution patches Dh will be combined according to these coefficients to generate the output high-resolution patch x. The problem of finding the sparsest representation of y can be formulated as:min∥α∥0s.t.∥FDlα−Fy∥22≦ε  (9)where F is a (linear) feature extraction operator. The main role of F in (9) is to provide a perceptually meaningful constraint on how closely the coefficients must approximate y. Although the optimization problem (9) is NP-hard in general, recent results [15, 16] indicate that as long as the desired coefficients are sufficiently sparse, they can be efficiently recovered by instead minimizing the l1-norm, as follows:min∥α∥1s.t.∥FDlα−Fy∥22≦ε  (10)Lagrange multipliers offer an equivalent formulation
                              min          ⁢                                          ⁢          λ          ⁢                                                  α                                      1                          +                              1            2                    ⁢                                                                                                        FD                    l                                    ⁢                  α                                -                Fy                                                    2            2                                              (        11        )            where the parameter λ balances sparsity of the solution and fidelity of the approximation to y.
Solving (11) individually for each patch does not guarantee compatibility between adjacent patches. We enforce compatibility between adjacent patches using a one-pass algorithm similar to that of [17]. The patches are processed in raster-scan order in the image, from left to right and top to bottom. We modify (10) so that the superresolution reconstruction Dhα of patch y is constrained to closely agree with the previously computed adjacent high resolution patches. The resulting optimization problem ismin∥α∥1s.t.∥FDlα−Fy∥22≦ε1∥PDhα−w∥22≦ε2  (12)where the matrix P extracts the region of overlap between current target patch and previously reconstructed high-resolution image, and ω contains the values of the previously reconstructed high-resolution image on the overlap. The constrained optimization (12) can be similarly reformulated as:
                              min          ⁢                                          ⁢          λ          ⁢                                                  α                                      1                          +                              1            2                    ⁢                                                                                                        D                    ~                                    ⁢                  α                                -                                  y                  ~                                                                    2            2                                              (        13        )            where
      D    ~    =                    [                                                            FD                l                                                                                        β                ⁢                                                                  ⁢                                  PD                  h                                                                    ]            ⁢                          ⁢      and      ⁢                          ⁢              y        ~              =                  [                                            Fy                                                                          β                ⁢                                                                  ⁢                w                                                    ]            .      The parameter β controls the tradeoff between matching the low-resolution input and finding a high-resolution patch that is compatible with its neighbors. Given the optimal solution α* to (13), the high resolution patch can be reconstructed as x=Dhα*.
Notice that (10) and (12) do not demand exact equality between the low-resolution patch y and its reconstruction Dlα. Because of this, and also because of noise, the high-resolution image X0 produced by the sparse representation approach of the previous section may not satisfy the reconstruction constraint exactly. We eliminate this discrepancy by projecting X0 onto the solution space of DHX=Y, computing
                              X          *                =                                                            arg                ⁢                                                                  ⁢                min                            X                        ⁢                                                        X                -                                  X                  0                                                                    ⁢                                                  ⁢                          s              .              t              .                                                          ⁢              DHX                                =          Y                                    (        14        )            The solution to this optimization problem can be efficiently computed using the back-projection method, originally developed in computer tomography and applied to superresolution in [18, 19]. The update equation for this iterative method isXt+1=Xt+((Y−DHXt)↑s)*p  (15)where Xt is the estimate of the high-resolution image after the t-th iteration, p is a “backprojection” filter, and ↑s denotes upsampling by a factor of s.
We take result X* from backprojection as our final estimate of the high-resolution image. This image is as close as possible to the initial superresolution X0 given by sparsity, while satisfying the reconstruction constraint. The entire superresolution process is summarized as Algorithm 1.
Algorithm 1 (Superresolution Via Sparse Representation)
1. Input: training dictionaries Dl and Dh, a low-resolution image Y.
2. For each 3×3 patch y of Y, taken starting from the upper-left corner with 1 pixel overlap in each direction,
                Solve the optimization problem with {tilde over (D)} and {tilde over (y)} defined in (13):        
      min    ⁢                  ⁢    λ    ⁢                          α                    1        +            1      2        ⁢                                                                                D                ~                            ⁢              α                        -                          y              ~                                                2        2            .                      Generate the high-resolution patch x=Dhα*. Put the patch x into a high-resolution image X0.3. End4. Using back-projection, find the closest image to X0 which satisfies the reconstruction constraint:        
      X    *    =                              arg          ⁢                                          ⁢          min                X            ⁢                                X          -                      X            0                                      ⁢                          ⁢              s        .        t        .                                  ⁢        DHX              =    Y  5. Output: superresolution image X*
It would be desirable to provide a method and apparatus to achieve real-time superresolution and video transmission that directly produces high resolution video on the decoder side from low quality low resolution video on the encoder side. In numerous computer vision applications, enhancing the quality and resolution of captured video is critical. Acquired video is often grainy and low quality due to motion, transmission bottlenecks, etc. Superresolution greatly decreases camera jitter to deliver a smooth, stabilized, high quality video. Superresolution has been used with video coding for many applications.
In this innovation, we integrate standard image/video processing techniques in unique form where we encode the global motion vectors and include it in the encoded low resolution video to transmit over the communication channels. On the decoder side, from the decoded video and motion vectors, we integrate motion registration, non-uniform interpolation, and image post-processing techniques to accomplish superresolution. We further enhance the superresolved video quality based on the state-of-the-art regularization technique where the image is iteratively modified by applying back-projection to get a sharp and undistorted image. Finally, the invention also relates to compressive sensing technique, which is utilized as the image filtering step. We directly use the interframe sparsity property among video frames in order to remove the requirement of having the trained dictionaries on the decoder side for each user. The results show the proposed technique gives high quality and high resolution videos and minimizes effects due to camera jerks. This technique has been ported to hardware for product development. We have shown the performance improvement of the hardware superresolution over the software version (c code).