In the last fifteen years or so accessing, generating and exchanging information has fundamentally shifted for Governments, commercial enterprises, private and public organizations, and the general public. In those fifteen or so years the Internet has gone from a niche application to an essential element of the lives of most individuals in the developed world. As of Jul. 1, 2009 it was estimated that the number of Internet users had exceeded 1.67 billion people out of a world population of approximately 6.8 billion, i.e. 25% of the world's population. These users are accessing information contained in approximately 22 billion pages hosted on over 110 million websites (http://www.domaintools.com/internet-statistics).
Over the same period of time how the Internet is accessed has shifted dramatically as well. No longer are users sitting at desktop personal computers (PCs) in front of 15″ or 17″ CRT displays interfaced to large metal cases hosting for example a single Intel® 486 processor operating 50 MHz or 100 MHz with 32 MB memory with a 16 GB hard-drive accessing dial-up connectivity at 56 kb/s. Today their desktop PC is most likely to be a laptop PC working alone or in conjunction with a LCD display of dimension 17″, 19″, 21″, etc up to 32″ or more for graphical designers allowing them to unplug and move to another location to continue working. This laptop for example containing an AMD Athlon™ Dual-Core 2.00 GHz processor with 4 GB memory, a 500 GB hard-drive, and with Internet connectivity at 5 Mb/s, 10 Mb/s or more through wireless WiFi (IEEE 802.11) or WiMAX (IEEE 802.16) interfaces.
Additionally a multitude of other portable electronic devices now provide their users with Internet access including for example personal digital assistants (PDAs) and cellular telephones (e.g. Apple iPhone, Research in Motion's Blackberry, Palm Pre, Samsung Chocolate), gaming consoles (e.g. Microsoft Xbox, Nintendo DSi, Nintendo Wii), and audiovisual media players (e.g. Apple iPod). Accordingly users can access the Internet essentially anywhere and anytime with one of several devices they typically posses. Further recent device developments such as the Apple iPhone® with integrated silicon MEMS devices allow for dynamic rotation of the mobile device display between landscape and portrait formats as the user rotates their device. Further, operating systems such as Microsoft's Windows and Apple's MAC OS X allow users to dynamically change the size and effective orientation of web pages on their computers, and newer introductions such as Microsoft Window 7 allow users to dynamically move and display directly content from their laptop PC to another device such as another laptop, television, PDA etc.
As a result the original consideration of images on mobile devices as simply wallpaper and screen savers or web site content as being displayed on large portrait orientated PC displays has been destroyed. Audiovisual content posted to the Internet within any web page is dynamically accessed, dynamically adjusted, and is highly manipulated. A news image may be accessed within seconds by millions of users with displays from typical cellular telephone 240×320 and 320×480 pixel displays of 2.S″ or 3.2″ through to IS″ or 17″ displays of 1920×1080p supporting HDTV and above to 32″, 42″ LCD, plasma displays, and projectors as users employ their televisions as monitors.
Further users expectations have increased during this time. Applications such as Microsoft Word and Corel WordPerfect have evolved from being simple word processing applications to entry level desktop publishing suites supporting graphics and audiovisual content and the generation of web pages. At the same time desktop publishing software has expanded to facilitate direct handling of XML, HTML languages, multiple interfaces to digital audio, digital photo, and digital video applications and allowing direct publication in printed formats, secured digital content, and web content.
However, despite all these advances the content published onto Internet web pages is in the vast majority of cases fixed, even from leading content providers such as Yahoo and Google. Hence, as the viewing user adjusts the dimensions of their web page, for example allowing them to view the Internet content whilst working on another application without having to move from one application to another, then essentially their web page acts similar to a window adjusting the amount of the web page they can view but the audiovisual content is typically fixed in size. In the other cases, for example Google image search, the content is adjusted to a limited extent according to the dimensions of the web browser page, for example the number of images across the web browser page changes. However, the image sizes remain constant and the user must now scroll further to view all the images and move to the next page. In others the page layout adjusts to display the text according to the web browser page size but again the dimensions of the image have been fixed. Today image manipulation in respect of adjusting displayed dimensions of an image is essentially limited to the desktop publisher's domain when generating the web page content. The user's ability to control the display of the web page content is limited to either adjusting the web browser page size or adjusting the zoom that the web browser displays content with.
It would be beneficial for audiovisual content presented to a user to be dynamically displayed according to a variety of factors including but not limited to the dimensions of the web browser page, image display device dimensions, and image display device resolution for example. In this manner disadvantages of the prior art that will become evident in the descriptions of these approaches will be removed.
Amongst the earliest prior art techniques for image adjustment to reflect a change in displayed dimensions is cropping, such as shown in FIG. 1, where two desktop publisher snapshot images 100 and 150 are shown. First desktop publisher snapshot image 100, from Adobe Photoshop Lightroom® shows an image of a bride 110 together with a cropped highlighted region 120 which the user will select as the cropped image to employ. Similarly second desktop publisher snapshot image 150, from Adobe Photoshop shows a cityscape 160 together with a cropped cityscape region 170 which the user has selected as the cropped image to employ. Second desktop publisher snapshot image 150 also has icon 180 that projects an automatically generated mask onto the cityscape 160 at either a predetermined pixel count or physical dimension. However, this prior art approach only works to reduce an image dimension, it cannot scale the image up, and if automatically generated may remove significant content in the image. Cropping does not scale the source image even when reducing the displayed dimensions and has typically been limited to date therefore to desktop publishing.
Within the prior art there are many approaches to automate the cropping operation by detecting content and cropping in dependence of the content. Examples include A. T. Schowkta in U.S. Pat. No. 7,133,050 entitled “Automated Image Resizing and Cropping”, Suh et al in “Automated Thumbnail Cropping and its Effectiveness (UIST'03 Proc. 16th ACM Symposium User Interface Software and Technology, ACM Press, New York, pp. 95-104, 2003), A. Santella et al in “Gaze-Based Interaction for Semiautomatic Photo Cropping” (Proc. SIGCHI Conference on Human Factors in Computing Systems, pp. 771-780, 2006) and E. G. Callway in US Patent 2007/0,152,990 entitled “Image Analyzer and Adaptive Image Scaling Circuit and Methods”.
Within the prior art such cropping methodologies have been employed in conjunction with linear and non-linear scaling methodologies to provide images of variable size. Linear and non-linear scaling allows the generation of images that are both larger and smaller than the original whilst cropping adjusts the image content. Such a non-linear technique being shown in FIG. 2 by resizing tool window 200, as provided by SB Software (Nonlinear Image Resizing Tool, Version 1.0, www.sb-software.com). As shown within resizing tool 200 an original image 210 of dimensions 747×923 pixels has been selected for resizing to resized image 220 of dimensions 1024×768 pixels representing an aspect ratio change from 0.81:1 to 1.33:1. As indicated by resizing setting toolbar 230 the user can apply nonlinear factors that range from squeezing the centre of the image and stretching the edges of the image through to the reverse of stretching the centre of the image and squeezing the edges of the image. Such a non-linear scaling whilst an improvement over linear scaling in many instances can still result in unnatural images, particularly as the human visual process is highly sensitive to distortion and non-linearity.
Extensions of this technique to reduce such visual irregularities and reduce the user perceptions that image manipulation has been undertaken have included A. Soroushi in U.S. Pat. No. 7,355,610 entitled “Method and Apparatus for Expanding Image Data to Create Enlarged Images for Display”, Y-H. Lee in US Patent Application 2007/0,147,708 entitled “Adaptive Image Size Conversion Apparatus and Method Thereof”, and C-H. Chou in US Patent Application 2007/0,104,394 entitled “Method and System for Digital Image Magnification and Reduction.” However, such whilst addressing the automation aspect of dynamically adjusting images to different display devices or varying web browser page dimensions they have drawbacks in terms of requiring significant processing complexity even if they can be implemented in the firmware of devices or require additional specific hardware.
It would be apparent that a requirement for a solution addressing high volume consumer applications of image display would be beneficially provided without requiring additional hardware and in a software I firmware form that operates within a wide range of portable devices with varying processing capabilities. Further such firmware should beneficially operate rapidly to provide real time image resizing and with low power consumption to extend the portable device lifetime to the user. Such a focus within the prior art is typically absent as most prior art applications have focused to desktop publishing type applications such as Adobe Photo shop, Corel PhotoShop, Microsoft PowerPoint, and Microsoft Publisher for example wherein the user is primarily authoring and generating content for publication either in physical or online media formats. Referring to FIG. 3 there is presented an image scaling flow according to the prior art of S-H Lee in US Patent Application 2008/0,019,439 entitled “Apparatus and Method for Low Distortion Display in a Portable Communication Terminal”. As shown in first step 300A an image 310 has been received by a portable device, not shown for clarity, for display that requires resizing. Accordingly the process of Lee divides the image 310 in second step 300B to a plurality of image segments 321 through 327 in preparation for applying the transformation to each image segment 321 through 327. In third step 300C a linear or non-linear scaling is applied to each image segment 321 through 327 thereby generating scaled image segments 331 through 337. The scaling applied to each of image segment 321 through 327 to generate scaled image segments 331 through 337 being different such that the content is scaled to an increased percentage of the image to be displayed to the user but is done so in a manner that is supposed to reduce perceived distortion.
However, Lee applies a predetermined scaling according to a mathematical function, for example a cosine function, such that weighting in the scaled image is given to the central portion of the content which is expanded and the outer portions are reduced when the overall image is to be reduced dimensionally. Whilst other mathematical functions may be employed such as a sine, hyperbolic tangent, sinc etc for example the appropriate mathematical function should be determined by the content of the image which requires in an automatic scaling application, that the image be first processed to determine the distribution of content and hence appropriate function to apply. Equally, Lee only teaches applying the function in one dimension whereas it would be beneficial to provide the methodology in two dimensions when considering the target portable devices etc. Other examples within the prior art include P. O. Vale in U.S. Pat. No. 7,385,615 entitled “System and Method for Scaling Images to Fit a Screen on a Mobile Device According to a Non-Linear Scale Factor”.
A further alternative is taught by H. Chao et al in US Patent Application 2008/0,095,470 entitled “Digital Image Auto-Resizing” and shown schematically in FIG. 4 as applied to an initial image 410. As shown Chao teaches that the image is broken into two portions, a first portion 420 where the content will be scaled at a first scaling factor, and a second portion 440 which will be scaled at a second scaling factor.
Accordingly first portion 420 is broken into four elements, first to fourth elements 421 through 424 respectively, which will be scaled to fit the new overall window to present the scaled image 460 but is performed in a manner to reduce the reduction in the portion of the scaled image given to the second portion 440. Hence, first element 421 and fourth element 424 would be scaled only in the horizontal axis whilst second element 422 and third element 423 would be scaled only in the vertical axis. As such the scaled replicas of first to fourth elements 421 through 424 respectively are combined to form scaled first portion 430. The second portion 440 is scaled to generate scaled second portion 450 and is then combined with scaled first portion 440 to generate the scaled image 460 to be presented to the user. Again a drawback of Chao is that selecting the portions of the image, namely first and second portions 420 and 440 respectively, can significantly impact the resultant scaled image 460 and the viewer's perception or satisfaction as a result. Other examples of such blocked scaling of images include K. Berkner et al in U.S. Pat. No. 7,548,654 entitled “Header Based Scaling and Cropping of Images Compressed Using Multi-Scale Transforms” and S J. Kaasila et al in U.S. Pat. No. 7,287,220 entitled “Methods and Systems for Displaying Media in a Scaled Manner and/or Orientation”.
Extensions of such cutting, scaling and re-pasting include those reported by V. Setlur et al in “Automatic Image Re-Targeting” (Proc. 18th ACM Symposium on User Interface Software and Technology, pp. 153-162, 2005), J. Jia et al in “Drag-and-Drop Pasting” (Proc. SIGRAPH 2006, Vol. 25, No. 3, pp. 631-637 July 2006), J. Wang et al in “Simultaneous Matting and Compositing” (Microsoft Technical Report MSR-TR-2006-63, May 2006), C. Jacobs et al in “Adaptive Grid-Based Document Layout” (Proc. ACM SIGGRAPH, pp. 838-847, 2003), W. T. Freeman et al in U.S. Pat. No. 6,919,903 entitled “Texture Synthesis and Transfer for Pixel Images”, and I. Clarke et al in US Patent Application 2006/0,072,853 entitled “Method and Apparatus for Resizing Images.”
A further extension of this approach within the prior art was described by B. S. Hallberg et al in U.S. Pat. No. 6,563,964 entitled “Image Down-Sampling Using Redundant Pixel Removal” wherein the image to be reduced in size was non-uniformly down-sampled to remove aliasing within the high spatial frequency information content such that low spatial frequency information content is preferentially removed. This required that the image be processed by a spatial frequency estimator that compared groups of pixels in order to produce a classification of the image. Subsequently a path generator and path scorer analyze potential deletion paths within the image and the path with highest score, the one giving minimal distortion and aliasing, is selected for pixel removal. This process being repeated until a desired number of image rows and/or columns have been removed. As such Hallberg teaches that the entire image is arbitrarily analyzed rather than the preceding prior art wherein sampling of the image for determination of scaling was predetermined by applying a mask, template or mathematical function. However, Hallberg as noted only addresses reduction and is primarily focused to the problem of reducing the display of textual based information such as directory listings etc in applications such as Windows Explorer as the display type varied rather than arbitrary window generation as users adjust web browser pages etc.
The approach of Hallberg was extended by S. Aviden et al as reported in U.S. Pat. No. 7,477,800 entitled “Method for Re-Targeting Images” and their publication “Seam Carving for Content Aware Image Resizing” (ACM Transactions on Graphics SIGGRAPH 2007, Volume 26, Number 3, Article 10, July 2007). Aviden coined the term “seam carving” to refer to a simple image operator that provides adjustment of an image's size by carving-out or inserting pixels in different parts of the image. The determination of “seams” to carve or insert being made in respect of an energy function that defines the importance of pixels. A “seam” being defined by a connected path of low energy pixels crossing the image from one side to another representing the minimum energy path across the image. Removal of these “seams” providing for reduction in the image dimension in horizontal and/or vertical dimensions whilst insertion of these “seams” providing for expansion of the image. Aviden states that the image operator produces, in effect, a content-aware resizing of the image.
Additional extensions of this work have been reported by M Klingemann (see flash blog http://www.quasimondo.com/archives/000652.php of September 2007) using an energy function generated through convolving the image with a blurred offset version of itself, the offset being a few pixels. H. Welles has also published open source implementations of the “seam carving” method of Aviden (see Ariadne and Seamstress algorithms at http://seam-carver.sourceforge.net).
Aviden teaches that the digital image to be dimensionally adjusted is initially converted into a so-called “energy map” wherein every pixel in the image is mapped to a pixel within the “energy map,” Subsequently the cumulative energy for a continuous I-pixel wide “seam” is calculated from one side of the image to the other side. The two preferred energy functions taught are outlined below in Equations 1 and 2. Aviden teaches that no single energy function works well across all images but that most have similar ranges of resizing before visual artifacts are introduced.
                                          e            1                    ⁡                      (                          I              ⁡                              (                                  x                  ,                  y                                )                                      )                          =                                                                        δ                                  δ                  ⁢                                                                          ⁢                  x                                            ⁢                              I                ⁡                                  (                                      x                    ,                    y                                    )                                                                          +                                                                δ                                  δ                  ⁢                                                                          ⁢                  y                                            ⁢                              I                ⁡                                  (                                      x                    ,                    y                                    )                                                                                                    (        1        )                                                      e            HoG                    ⁡                      (                          I              ⁡                              (                                  x                  ,                  y                                )                                      )                          =                                                                                            δ                                      δ                    ⁢                                                                                  ⁢                    x                                                  ⁢                                  I                  ⁡                                      (                                          x                      ,                      y                                        )                                                                                      +                                                                          δ                                      δ                    ⁢                                                                                  ⁢                    y                                                  ⁢                I                ⁢                                  (                                      x                    ,                    y                                    )                                                                                        max            ⁡                          (                              HoG                ⁡                                  (                                      I                    ⁡                                          (                                              x                        ,                        y                                            )                                                        )                                            )                                                          (        2        )            where I(x, y) is a particular pixel, and HoG (I(x, y)) is taken to be a histogram of orientated gradients at every pixel (see N. Dalal et al “Histograms of Orientated Gradients for Human Detection” Intl. Conf. Computer Vision and Pattern Recognition, Vol. 2, pp 886-893). Aviden teaches using an 8-bin histogram computed over an 11×11 window around a pixel for HoG(I(x, y)).
Referring to FIG. 5 the method of A divan is presented using images taken from the publication “Seam Carving for Content Aware Image Resizing” (ACM Transactions on Graphics SIGGRAPH 2007, Volume 26, Number 3, Article 10, July 2007). A source image 510 is shown, and the intention is to change the aspect ratio from say 4:3 to 16:9. Applying a conventional linear scaling to source image 510 results in linear image 520. Applying the method of “seam carving” of Aviden begins with the generation of the “energy map” 530 from source image 510. From this single “energy map” a horizontal seam map 540 is generated together with vertical seam map 550 that define the cost of removing a seam in each direction. Based upon the determination to remove either a horizontal and/or vertical seam a carved image 560 is generated. If the carved image 560 is not at the target image size then the process cycles back to recalculate the energy map 530 and repeats until the final image dimension is achieved.
Aviden teaches that resizing an image from 240×320 pixels to 128×160 pixels, such as reflects an image shifted from the inner display of a Blackberry Pearl Flip cellular telephone to it's outer display, would be achieved by removing 112 vertical ‘seams’ and 160 horizontal “seams”. Removal of each seam requires that the “energy map” is recalculated to determine which “seam” is to be removed next. Accordingly the removal of the 112 vertical and 160 horizontal “seams” requires the generation of 272 “energy maps” which is computationally intense, particularly so if Equation (2) was employed. As such Aviden teaches that a designer may author a multi-size image once and a client application depending upon the image size needed performs the requisite number of “seam” removals or additions such that the resizing can occur quickly in real time to fit the layout or display. The authoring being the computationally intense generation of the large number of “energy maps” and processing of the “seam” determinations to generate the multiple image sizes. The information relating to the multiple image sizes would for example be stored as a header within the image file. Such an approach of header encoding being taught, albeit not in relation “seam carving” for example by K. Berkner in U.S. Pat. No. 7,548,654 as outlined supra.
However, a user accessing the Internet and retrieving images is not going to only access images generated by publishers with desktop publishing software that included the “seam carving” information for multiple images embedded within. Further such an approach also affects even the retrieval of audiovisual content by increasing the file size. As of mid-2009 the indexable web contained at least 22 billion pages (http://www.worldwidewebsize.com) hosted on over 110 million websites (http://www.domaintools.com/internet-statistics). Simply searching using Google for images with “photo” returns over 700 million results whilst popular social networking websites such as Facebook are reported at peak times to have 300,000 images uploaded a second by registered members. It would be evident that even if “seam carving” was introduced into all image generating devices, such as desktop publishing software, digital cameras, cellular telephones etc, by virtue of being embedded as part of an international standard such as Portable Network Graphics (PNG), Tagged Image File Format (TIFF), and Motion Pictures Expert Group (MPEG) for example, that it would take a significant period of time to become the dominant format for digital audiovisual content accessible to Internet users.
Accordingly it would be beneficial to provide a method of resizing digital images that was independent of their method of generation, i.e. portable consumer electronics or desktop software, independent of the platform upon which the images were to be displayed, i.e. low cost consumer portable devices or laptop computers, the display they are to be displayed upon, i.e. 128×160 pixel 1.8″ cellular telephone display, 1600×900 pixel 17.3″ laptop, user activity such as flipping the Apple iPhone from a 320×480 pixel portrait orientation to 480×320 pixel landscape orientation in a fraction of a second, and the source image format.
It would be further beneficial if the method of resizing was also content aware, i.e. provided scaling that did not remove significant image elements or distort images at typical resizing factors unless expressly permitted by the user. Such permission being provided within desktop publishing or image manipulation software such as Abode Photo shop, Corel Paint Shop Pro, Ulead Photo Impact for example. It would be further beneficial if the method permitted the protection of content during resizing or explicitly weighted content for removal during resizing or editing, was fast, and easily incorporated into the firmware of devices as well as desktop publishing software.