The present invention relates to image enhancement using a semantic based technique.
Digital images are typically represented as an array of pixels. Similarly, digital video is typically represented as a series of images or frames, each of which contains an array of pixels. Each pixel includes information, such as intensity and/or color information. In many cases, each pixel is represented as a set of three colors, each of which is defined by eight bit color values.
In many cases, image and/or video encoding and/or transmission systems degrade the quality of the image content in order to reduce the storage requirements and/or the bandwidth requirements for transmission. After encoding and/or transmitting an image and/or video, a restoration technique is used on the image and/or video to attempt to recover the high-quality original image content from the degraded version. The degradation of the image content may occur as a result of numerous reasons, such as for example, image transmission, image coding, limitations of capture or display devices, etc. On the other hand, the enhancement of a degraded image attempts to improve the appearance of the image and/or video.
In other cases, the image content is provided at a first lower resolution, such as a progressive or interlaced scanning (e.g., 720×480 pixels). The image content may be provided in a non-degraded manner or in a degraded manner. The lower resolution image content may be enhanced in some manner to be suitable for displaying on a display having a resolution greater than the lower resolution image content, such as a 4K display (e.g., 3840 by 2160 pixels).
Restoration and/or enhancement of the image and/or video is often a processing step in an image/video display system, especially in large-sized displays. One of the goals may be to restore and enhance the visual appearance of important components of the image and/or video, for example edges, textures and other detail. Another goal is to limit the introduction of undesirable visual artifacts and/or amplification of existing artifacts during restoration and enhancement. A specific example is to limit the introduction or amplification of existing noise in the image and/or video, such as camera noise or compression noise. Another example is to limit introduction of artifacts near edges and contours known as “halo”, “undershoot” and “overshoot”.
Many different techniques have been used to attempt to perform image (inclusive of video) detail enhancement in order to restore an image. Many such techniques are based upon a hierarchical framework using a Laplacian pyramid to decompose the image into multiple levels, including a smooth low frequency image and other high frequency components. Each level is then enhanced and combined together to form the enhanced image. While decomposing the image, edge preservation techniques may be used to reduce halo effects.
Another technique to perform image detail enhancement involves applying a bilateral filter to get different components of the image under multiple lighting conditions and enhance the details of the image by combining these components. The range of the bilateral filter may be modified to simultaneously perform both detail enhancement and noise removal. Another technique includes acquiring information about oscillations of the image from local extrema at multiple scales and using this information to build a hierarchy which is used to enhance details of the image. Yet another technique involves using wavelets to build a multi-resolution analysis framework to decompose an image into smooth and its detail components, where the wavelets are specifically constructed according to the edge content of the image to reduce halo effects.
Another technique to perform image detail enhancement uses a filter to perform multi-scale decomposition of images. The filter is edge-preserving and the smoothing is based on a Weighted Least Squares (i.e., WLS) optimization framework. This may be mathematically represented as calculating the minimum of,
      ∑    p    ⁢      (                            (                                    u              p                        -                          g              p                                )                2            +              λ        ⁡                  (                                                                      a                                      x                    ,                    p                                                  ⁡                                  (                  g                  )                                            ⁢                                                (                                                            ∂                      u                                                              ∂                      x                                                        )                                p                2                                      +                                                            a                                      y                    ,                    p                                                  ⁡                                  (                  g                  )                                            ⁢                                                (                                                            ∂                      u                                                              ∂                      y                                                        )                                p                2                                              )                      )  
where g is the input image, u is the output image and subscript p is the spatial location of the pixel. This function tries to maintain u as close as possible to g and achieves smoothness by minimizing the partial derivatives of u. The smoothness weight is determined by ax and ay, while λ controls the amount of smoothing. Greater λ implies more smoothing. For example, this technique may be used in a Laplacian pyramid framework to obtain abstraction at different levels.
As previously described there are many different techniques to provide image enhancement together with increased resolution. For example, D. Glasner, S. Bagon, M. Irani, Super-resolution from a single image, ICCV 2009, describe the use of redundancies in the input image to construct a pyramid having low-res/high-res image pairs and uses a learning-based method to perform super-resolution of the input image. For example, J. Sun, J. Sun, Z. Xu, H. Y. Shum, Gradient Profile Prior, CVPR 2008, describe the use of a large database of natural images to learn the distribution of gradient profiles and modifies the gradient information of the input image to fit this distribution in order to obtain sharp edges and consequently perform super-resolution of images. For example, Yang, J. Wright, T. Huand and Y. Ma., Image super-resolution via sparse representation, IEEE TIP 2010, describe the use of a dictionary-based super-resolution method based on ideas in sparse signal processing and show how a joint compact dictionary can be trained to learn the correspondence between high-resolution and low-resolution training image patches. H. He, W. C. Siu., Single image super-resolution using Gaussian Process Regression, CVPR 2010, describe a super-resolution technique using a Gaussian Process Regression model without any training dataset. R. Fattal, Upsampling via imposed edge statistics, SIGGRAPH 2007, describe the use of a super-resolution method based on the relationship between the edge statistics based on local intensity continuity of low-resolution and the high-resolution images. W. Freeman, T. Jones, E. Pasztor, Example-based super-resolution, IEEE Computer Graphics and Applications 2002, describe using a technique to hallucinate high frequency details from a training set of high-resolution and low-resolution image pairs. Y. W. Tai, S. Liu, M. Brown, S. Lin, Super resolution using edge prior and single image detail synthesis, CVPR 2010, describe the use of an extended gradient profile technique using exemplar texture patches to get improved details in the image. J. Sun, J. Zhu, M. Tappen, Context constrained hallucination for image super-resolution, CVPR 2010, describe an image super-resolution technique formulated as an energy minimization framework that enforces different criteria such as fidelity of the high-resolution image to the input image, the fidelity of pixel to discrete candidate examples and the smoothness of edge. This method analyzes textural characteristics of a region surrounding each pixel to search database for segments with similar characteristics. All references described herein are incorporated by reference in their entirety.
Many existing techniques for detail enhancement, such as those mentioned above, are effective in enhancing the image and/or video. However, such techniques may still result in images that are not aesthetically pleasing to the viewer.
The foregoing and other objectives, features, and advantages of the invention will be more readily understood upon consideration of the following detailed description of the invention, taken in conjunction with the accompanying drawings.