Aspects of the exemplary embodiments disclosed herein relate to a system and method for the assessment of quality of photographic images.
Digital photographic images are produced by professional photographers and amateurs in increasing numbers. Such images may be made accessible through a public website where they can be rated for quality and other characteristics by viewers of the website. There has been considerable effort in the field of image quality assessment to design quality metrics that can predict the perceived image quality automatically. See, for example, Z. Wang, et al., The handbook of video databases: design and applications, Chapter 41, pages 1041-1078, CRC press, 2003. One objective has been to extract descriptors from the digital image with good correlation with human preference. See, H. Sheikh, et al., “A statistical evaluation of recent full reference image quality assessment algorithms, IEEE Transactions on Image Processing, 15(11):3440-3451, November 2006. The presence or absence of specific signal level degradations such as random or structured noise (e.g., salt and pepper noise, jpeg artifacts, ringing) and blur were used in the past to define the quality of a photographic image. However, high definition digital sensors are now readily available which allow photographers to overcome such degradations. Image quality assessment has focused more recently on the assessment of high level features that go beyond such image qualities which comply with best practices, such as “does the image obey the rule of thirds?” See, R. Datta, et al., “Studying aesthetics in photographic images using a computational approach,” ECCV (3), pp. 288-301, 2006 (hereinafter, “Datta 2006”); R. Datta, et al., “Learning the consensus on visual quality for next-generation image management,” MULTIMEDIA '07: Proc. 15th Intern'l Conf. on Multimedia, pp. 533-536, 2007 (hereinafter, “Datta 2007”); and R. Datta, et al., “Algorithmic inferencing of aesthetics and emotion in natural images: An exposition,” 15th IEEE Intern'l Conf. on Image Processing, pp. 105-108, October 2008 (“Datta 2008”).
The features which relate to image quality are often referred to as aesthetic features, because they are designed for capturing specific visual elements, such as color combinations, composition, framing, and the like which are not directly related to the content of the image but which have an impact on the perceived quality of the image. Aesthetic feature extraction schemes are described, for example, in the above-mentioned Datta references and in Y. Ke, X. Tang, and F. Jing, “The design of high-level features for photo quality assessment,” in CVPR, 2006. The features used in these methods are typically created by taking into account the perceptual factors that capture visual preference. These aesthetic features can be split into two broad classes: low-level features and high-level features.
Among the many low-level features available to describe light and color, the ones which are most popular involve simple statistics evaluated over the entire image. In particular, mean and standard deviation are in general computed in several perceptually and non-perceptually coherent color spaces such as Lab, RGB, YUV or HSV. Other descriptors are based on blur estimation techniques such as where a blurred image is modeled as the result of Gaussian smoothing filter applied to an otherwise sharp image. Other low level features calculate the dynamic range of an image by evaluating its gray-scale histograms. Colorfulness can be assessed by extracting 3-D color histograms and by calculating the Earth Mover Distance of each from a reference chromatic model.
High-level features focus on the analysis of objects and homogeneous image regions. Typical approaches are based on segmentation techniques, saliency extraction methods, or geometric contexts. The underlying idea is to capture composition and framing properties by looking at the position of the main subject or of dominant regions of the picture. In particular, the distance between object centroids and reference points in the image (e.g. four intersection points specified by the rule of thirds) can be calculated.
Despite the proliferation of annotated image data available through social networks, photo sharing websites, and the like, which could be used as training data, challenges for image quality assessment remain. First, such data is annotated with an intrinsic noise because when dealing with human preference, unanimous consensus is rare. Instead, general trends with varying proportions of outliers are often observed. While the amount of data used to train an automated system could be increased, this does not always solve the problem. Another challenge concerns the design of features to capture human preference. The features currently in use do not always correlate well with human perception. In other words, they are not powerful enough to capture all the required visual information for quality.
There remains a need for a system and method which can improve image quality assessment.