1. Field of the Invention
The present invention relates to the field of image representation. More particularly, the present invention relates to a method of representing images using mathematical models of geometric and brightness characteristics of the image, known as Content Oriented Representation (xe2x80x9cCORExe2x80x9d).
2. Background Information
Analysis and processing of digital images play an important role in nearly all disciplines of modern industry and economic activity. From medical imaging to industrial quality control and diagnostics, to entertainment and advertising, efficient image analysis, representation and processing is the primary component of overall imaging system performance.
Presently, there are forms of content-oriented representation of images known in the field of image representation. Partial implementations exist, known generally as xe2x80x9cvector formatsxe2x80x9d and xe2x80x9cvectorizations,xe2x80x9d which are representations of visual images by geometric entities, such as vectors, curves, and the like. Usually, vectorized images are significantly more compact and easier to process than identical images represented by conventional techniques relying on use of pixels, for example.
Currently available products incorporate limited vector formats, including for example, xe2x80x9cPhotoshop,xe2x80x9d developed by Adobe Systems Incorporated, xe2x80x9cFlashxe2x80x9dand xe2x80x9cShockwave,xe2x80x9d developed by Macromedia, Inc., and W3C Scalable Vector Graphics (SVG). However, vectorization methods employed by these products provide cartoon-like images and animations. In other words, they fail to adequately represent high resolution, photo-realistic images of the real world. This is because only simple, cartoon-like images allow for representation by edge partitions, which are not necessarily present in typical photo-realistic images. In contrast, high resolution, real world pictures present an enormous variety of forms and highly complex visual patterns, which conventional vectorization methods fail to capture. In fact, high resolution, real world images present such an enormous variety of forms and complex visual patterns, that visually accurate vectorization is practically impossible under the existing methods.
Existing vectorization techniques are confined by certain limitations, which must be overcome to adequately provide content-oriented representation of high resolution photo realistic images. The basic requirements of effective image representation include the following: (i) the resulting image has no visible distortions; (ii) the number of parameters in the resulting image is an order of magnitude less than the number of pixels in the original image; (iii) the parameters have simple visual interpretations; (iv) all reasonable image transformations are expressible in terms of the representation parameters, so that all the image processing operations are possible; and (v) with respect to video sequences and video compression, subsequent frames of the resulting image behave coherently, such that the models remain basically the same, while only respective geometric parameters change continuously.
Although the existing methods of image representation, processing and compression, such as DCT transform and the JPEG compression standard, as well as various wavelets transforms and compression schemes, may satisfy the first requirement above, they fail with respect to the remaining four requirements. Current methods of image representation are based on linear transformations of the image to a certain basis, which contains initially the same number of elements as the number of pixels in the original image. Subsequent quantization and filtering reduces the number of parameters, but in an unpredictable fashion. Also, visual interpretation of the reduced number of parameters may be difficult.
Moreover, because video sequences represent exactly the motion of certain objects and patterns (i.e., geometric transformations of the initial scene), the DCT or the wavelets representations of video sequences behave in an incoherent and unpredictable manner. Therefore, existing video compression techniques, such as MPEG, use JPEG compression for the first frame and perform motion compensation on a pixel level, as opposed to a compressed data level. This results in a tremendous reduction in efficiency.
A method for image representation and processing is described by James H. Elder and Rick M. Goldberg in xe2x80x9cImage Editing in the Contour Domain,xe2x80x9d IEEE (1998), based on edge capturing, together with the xe2x80x9cblur scalexe2x80x9d and the brightness values. Although providing additional efficiency to image representation and processing in the xe2x80x9cgeometric image domain,xe2x80x9d the disclosed method does not solve the main problems of the existing methods. In particular, the method relies on only edges, while ignoring more complicated characteristic lines. Likewise, the method ignores possible geometric proximities and crossings between edges. Reconstruction of brightness values between the edges relies on solving Laplace transform equations, which appears to be an ad hoc operation that does not take into account actual image brightness. Furthermore, the method does not include any tools for representing background and texture visual patterns. The geometric accuracy of the suggested edge detection method (i.e., marking the nearest pixel and edge direction) is not sufficient for a faithful image reconstruction. Lastly, the ad hoc Gaussian blur model does capture the actual cross-sections of the edges.
Advances in vectorization of high resolution images, however, have continued to evolve. For example, U.S. Pat. No. 5,410,643 to YOMDIN et al., the disclosure of which is expressly incorporated herein by reference in its entirety, describes a method for image data representation by mathematical models of a certain type. However, the visual quality and compression ratio is low and image processing on the compressed data is impractical. In U.S. Pat. No. 5,510,838 to YOMDIN et al., the disclosure of which is expressly incorporated herein by reference in its entirety, the images are represented by four types of models: edges, ridges, hills and background. Edges and ridges are represented by mathematical models that include polygonal lines representing the center lines of the edges or ridges and the corresponding brightness (or color) profiles. The brightness profiles are kept at the vertices of the polygonal lines and interpolated along segments of these lines. Hills, which correspond to small patches on the image, are represented by paraboloid-like mathematical models. Background is represented by low degree polynomials placed on a predetermined artificial grid.
More particularly, U.S. Pat. No. 5,510,838 discloses a method for detection of edges, ridges, hills and background, based on approximation of the image by second and third order polynomials on overlapping 4xc3x974 and 5xc3x975 pixels cells and further analysis of these polynomials. This method provides better quality and compression than the method disclosed in U.S. Pat. No. 5,410,643, for example. However, the image quality and resolution are not sufficient for most practical applications, the compression ratio is inferior to that of other conventional methods and processing the compressed data is cumbersome and complicated.
Other practical disadvantages of the invention disclosed in the U.S. Pat. No. 5,510,838 include the following: (i) the image is subdivided into cells of a size of 6 to 48 pixels, which cells are represented independently, thereby reducing image quality and the compression ratio; (ii) there is no accounting for visual adjacencies between the models; (iii) approximation of edges and ridges by polygonal lines causes visual degradation (e.g., a xe2x80x9cstaircase effectxe2x80x9d); (iv) the resolution of the detection method is insufficient due to second degree polynomial approximations on 4xc3x974 pixel windows; (v) representation of the background is unstable and inefficient, resulting in visual degradation of the image, low compression and cumbersome processing; and (vi) the only tool for representing background textures, which are areas of dense fine scale patterns of low visual significance, is combining a large number of hills, resulting in visual degradation and significant reduction in compression, especially for images with rich textures.
Some of the problems identified above were addressed in U.S. Pat. No. 5,960,118, to BRISKIN et al., the disclosure of which is expressly incorporated herein by reference in its entirety. For example, basic visual adjacencies between the image models were introduced. Also, polygonal approximation of edges and ridges were replaced by second order splines. Also, 4xc3x974 pixel windows were replaced by 3xc3x973 pixel windows in the original polynomial approximation. U.S. Pat. No. 5,960,118 also discloses a completely new method for representing, compressing and rendering photo realistic 3D-virtual worlds. As a result, both image quality and compression ratio were improved. However, U.S. Pat. No. 5,960,118 did not eliminate the necessity of subdividing images into cells of 6 to 48 pixels. Also, representation of background and textures is still problematic.
One major problem continues to be that the conventional methods only provide a xe2x80x9csemi-localxe2x80x9d image representation based on mathematical models and fail to provide xe2x80x9cglobalxe2x80x9d representation of the entire image. As stated above, the initial steps include subdividing the image into cells between 6 to 48 pixels in size (e.g., about 20 pixels) and representing the image completely independently in each cell. Dissecting the image into independent blocks causes significant disadvantages and significantly reduces processing efficiency. The process effectively renders each cell a basic element of the image representation, yet the cells are completely artificial in nature (e.g., the cells have no relation to the original image itself and do not represent any perceivable visual pattern).
The artificial separation of the image into cells is detrimental in all applications of vectorized images. With respect to compression, for example, most edges and ridges on an image are longer than a single cell. Therefore, representation of the edges and ridges must be divided into segments having end points at the cell boundaries. These end points must be memorized, even though they have no visual significance, requiring additional computer storage. Moreover, splitting edges and ridges into segments precludes the possibility of taking into account global geometric correlations of data along these curves. Similarly, subdivision of an image into cells requires explicit treatment of the connected components of background elements inside each cell. This introduces complicated geometric patterns, which are completely irrelevant to the original image.
With respect to quantization of parameters, subdivision into cells requires quantization to be performed on each cell separately. This may result in different quantizations of color and geometric parameters in adjacent cells, which result in visual discontinuities between the adjacent cells even for relatively fine quantization steps. (It is well known that human visual perception is highly sensitive to discontinuities along simple lines, such as cell boundaries). All conventional representation methods based on cell subdivision, share this disadvantage. For example, JPEG compression has a xe2x80x9cblocking effect,xe2x80x9d apparent even for relatively fine quantization of the coefficients.
Also, separation into cells negatively affects geometric transformations. It is well known that continuous geometric transformations never respect the predefined subdivision of an image into cells. An attempt to express such transformations on the cell level leads to the necessary introduction of cell intersections and xe2x80x9cgeometrically deformedxe2x80x9d cells. This procedure is complex and often fails in the initial stages, making geometric processing of cell subdivisions virtually impossible.
The present invention overcomes the problems associated with the prior art, as described below.
In view of the above, the present invention through one or more of its various aspects and/or embodiments is presented to accomplish one or more objectives and advantages, such as those noted below.
The present invention overcomes the shortcomings of existing image representation techniques based on vectorization. It provides a complete content-oriented representation of high resolution, photo-realistic images. It further provides an efficient tool for mainstream imaging applications, including mathematical model representation without visible distortions of captured images. Furthermore, the representation involves an order of magnitude fewer parameters, having simple visual interpretations, than the number of pixels in the original image.
In the applications of the present invention, image processing operations, such as pattern definition, detection and separation, image analytic continuation and completion, are completely based on geometric and visual integrity of characteristic lines. They depend on the assumption that geometric and color patterns of the models faithfully represent visual patterns that actually exist on the image itself. Consequently, the present invention enables operations that are impossible under methods requiring artificial separation of the image into cells.
An aspect of the present invention provides a method for representing an image that includes identifying multiple characteristic lines of the image, identifying visual relationships among the characteristic lines, which include proximities and crossings, and defining a background of the image, which includes a slow-scale background and background patches. The method may further include assembling mathematical models that represent the characteristic lines, the visual relationships and the background, respectively. The mathematical models representing the characteristic lines are aggregated with the mathematical models representing the visual relationships among the characteristic lines and with the background of the image. The data representing the mathematical models may then be stored, transmitted and/or processed.
According to an aspect of the present invention, the proximities are identified by identifying boundary lines corresponding to each of the characteristic lines and approximating each boundary line by spline curves. Each spline curve is subdivided into multiple spline segments, each spline segment being less than a predetermined number of pixels in length. The spline segments are processed to determine and mark couples of joined spline segments. Proximity intervals are then defined based on at least one chain of couples. Further, each proximity interval may be represented by a proximity interval center line that adjoins boundaries of the proximity interval. A mathematical model of the proximity interval can be determined based on the mathematical model of a proximity interval characteristic line defined by the proximity interval center line.
According to an aspect of the present invention, the crossings are identified by identifying elementary crossings, first order crossings, second order crossings and proximity crossings. Identification of the elementary crossings is based on at least geometric adjacencies among endpoints and spline segments of multiple edges and ridges. Identification of the first order crossings is based on at least a detection of intersecting characteristic lines among the characteristic lines of the image. Identification of the second order crossings is based on at least a detection of identified first order crossings within a predetermined distance of one another. Identification of the proximity crossings is based on proximities that are less than a predetermined distance in length. Mathematical models of the various identified crossings are determined based on mathematical models of the characteristic lines that form the crossings.
According to another aspect of the present invention, the slow-scale background of the image is identified by constructing a background grid, which includes fixing a resolution of the background grid and identifying multiple points within the background grid. Approximating polynomials representative of each of the points within the background grid is constructed based on a predetermined degree for the polynomials. The mathematical model representing the slow-scale background includes the multiple points and the corresponding approximating polynomials.
Identifying the multiple points within the background grid may include a signal expansion algorithm. The signal expansion algorithm includes identifying a boundary of the slow-scale background, dividing the background boundary into subpieces having a predetermined length and including in the background grid the boundary endpoints associated with each of the subpieces. In addition, multiple points to be included in the background grid are identified, each point being at least a first predetermined distance from all of the other points and the boundary endpoints. Location data is transferred from each point and each boundary endpoint to data structures of neighboring points, which include all other points of the multiple points, all other boundary endpoints and pixels that are located within a second predetermined distance of each point and each boundary endpoint. The transferred location data is stored in relation to the neighboring points in the receiving data structures.
The brightness of the background of the image may be reconstructed, which includes retrieving for each point within the background grid the corresponding approximating polynomial and performing a signal expansion algorithm for each point without crossing boundaries corresponding to the characteristic lines of the image. A brightness value correlating to each point is computed and translated to a corresponding pixel of the represented image.
According to an aspect of the present invention, the background patches of the image are identified by identifying closed characteristic lines that enclose an area less than or equal to a predetermined size, and short characteristic lines that have a length less than or equal to a predetermined distance. Also, fine scale patches are identified, which includes convoluting the image through a Gaussian filter, identifying at least one of a local maximum value and a local minimum value, and approximating a fine scale patch shape around the local maximum value and/or the local minimum value. The background patches may be approximated as mathematical models by determining mathematical models corresponding to the identified closed characteristic lines and the identified short characteristic lines. The fine scale patches are identified as mathematical models by determining corresponding mathematical models of at least one of a re-scaled Gaussian function and a re-scaled paraboloid.
Representing the background of the image may further include identifying textures. Identifying textures includes identifying patches in the background, marking pixels located within the identified patches, constructing polynomials based on a least square approximation of a brightness value of pixels that have not been marked as being located within the identified patches and filtering out selected patches based on a predetermined size limit and a difference between a brightness of the patches and a brightness of the slow-scale background at each of the points within the patches. The patches other than the filtered out patches are aggregated with the constructed polynomials. A difference between a brightness of the aggregated patches and a brightness of the image is determined to form a residual background, which is represented via a wavelets system.
Another aspect of the present invention provides a method for representing an image including identifying multiple characteristic lines of the image and identifying proximities between at least two of the characteristic lines based on a chain of coupled spline segments, which are derived from boundary lines corresponding to each of the at least two characteristic lines and are less than a predetermined length. The method further includes identifying crossings between at least two of the characteristic lines. The crossings include elementary crossings based on at least geometric adjacencies among endpoints of edges and ridges, first order crossings based on at least a detection of the plurality of characteristic lines, and second order crossings based on at least an image pattern that is common to more than one of the identified first order crossings. A slow-scale background of the image is identified based on a signal expansion algorithm performed on multiple points within a background grid and corresponding approximating polynomials representative of each of the points within the background grid. Background patches of the image are identified based on identifying of at least one of closed characteristic lines enclosing an area less than a predetermined size and short characteristic lines having a length less than a predetermined distance. Mathematical models corresponding to the characteristic lines, the proximities, the crossings, the slow-scale background and the background patches are determined and aggregated to represent the image.
The method for representing an image may further include defining textures, which includes marking pixels located within the identified patches, constructing polynomials based on a least square approximation of brightness values corresponding to pixels that have not been marked, and filtering out selected patches based on size and brightness parameters. The patches not filtered out are aggregated with the constructed polynomials. A difference between a brightness of the aggregated patches and a brightness of the image is determined to form a residual background, which is represented by a wavelets system.
Another aspect of the present invention provides a method for globally representing an image. The method includes covering the image with multiple overlapping subsections, each subsection having a predetermined size and shape, wherein the size is greater than or equal to a locality size. Each subsection is processed independently, which includes identifying characteristic lines in the subsection, identifying visual relationships among the characteristic lines (e.g., proximities and crossings) and assembling mathematical models representing the characteristic lines and the visual relationships, respectively. The mathematical models representing the characteristic lines are aggregated with the mathematical models representing the visual relationships among the characteristic lines. The method further includes determining for each of the characteristic lines in each overlapping portion of the subsections whether a non-conforming model exists for the same characteristic line in the overlapping portion and interpolating among the non-conforming models. A background of the subsections is defined, which includes slow-scale background and background patches. The overlapping portions of the subsections are filtered.
Interpolating among non-conforming models in each overlapping portion of the subsections may include choosing a first representative point on a central line corresponding to each characteristic line in a first subsection having a non-conforming model of the characteristic line in at least a second subsection that overlaps the first subsection, and choosing a second representative point on a central line corresponding to the characteristic line in the second subsection. The first representative point and the second representative point are joined by a spline segment. At least a cross-section of the spline segment is then determined to represent at least an interpolated cross-section of the characteristic line.
Defining the background of the subsections may include constructing a background grid in each image subsection, which includes fixing a resolution of the background grid and identifying points within the background grid. The points may be identified by performing a signal expansion algorithm. Approximating polynomials representative of each point are constructed in the subsection background grid based on a predetermined degree for the polynomials. Background patches of each image subsection are identified, which includes identifying closed characteristic lines that enclose an area less than or equal a predetermined size, identifying short characteristic lines that have a length less than or equal to a predetermined distance, and identifying fine scale patches. Identifying fine scale patches includes convoluting the image through a Gaussian filter, identifying at least one of a local maximum value and a local minimum value, and approximating a fine scale patch shape around the local maximum value and/or local minimum value. Background textures of each image subsection may also be identified by marking pixels located within the identified background patches in the image subsection, constructing polynomials based on a least square approximation of a brightness value of pixels that have not been marked, filtering out selected patches based on a predetermined size limit and a difference between a brightness of the patches and a brightness of the slow-scale background at each of the points within the patches, aggregating patches not filtered out with the constructed polynomials, determining a difference between a brightness of the aggregated patches and a brightness of the image to form a residual background, and representing the residual background by a wavelets system.
The filtering of overlapping portions of the subsections may include deleting grid points of the overlapping portions of each subsection, maintaining the following conditions: a distance between any two points of the grid is at least R/2, where R is the resolution of the grid; every grid point is closer than R to at least one other grid point; and every grid point belongs to a boundary of the background or is farther than R/2 from the boundary of the background. All redundant representations of the background patches of each image subsection and all redundant representations of the background textures of each image subsection are deleted.
Yet another aspect of the present invention provides a method for creating a composite image by superimposing representations of corresponding multiple images of a common scene from a common perspective. The method includes representing a first image of the multiple images, which includes identifying multiple characteristic lines, identifying visual relationships among the characteristic lines (e.g., proximities and crossings) and defining background elements (e.g., slow-scale backgrounds and background patches) of the first image. Each of the characteristic lines, the visual relationships and the background elements includes at least a geometric parameter and a brightness parameter. Brightness parameters of each of the remaining images of the multiple images are then sequentially determined by isolating the geometric parameters of the characteristic lines, the visual relationships and the background elements of the first image, and deriving for each remaining image corresponding brightness parameters for the characteristic lines, the visual relationships and the background elements corresponding to the isolated geometric parameters from the first image.
The multiple images may originate from corresponding different image sources. The multiple images may respectively include a red color separation, a green color separation and a blue color separation. Alternatively, the first image of the multiple images may include a luma color separation.
Another aspect of the present invention provides a computing apparatus for implementing representation of a digital image. The computing apparatus includes a computing device for executing computer readable code, an input device for receiving the digital image and interfacing with a user, at least one data storage device for storing computer data, and a programming code reading device that reads computer executable code. The computing device is in communication with the input device, the data storage device and the programming code reading device. The computer executable code causes the computing device to identify multiple characteristic lines of the digital image, identify visual relationships among the plurality of characteristic lines, including proximities and crossings, and define a background of the image, including slow-scale background and background patches. The computer executable code further stores data representing at least one of the multiple characteristic lines, the visual relationships and the background in the at least one data storage device.
The computer executable code may further cause the computing device to assemble mathematical models representing the multiple characteristic lines, the visual relationships and the background, respectively. The mathematical models representing the characteristic lines are then aggregated with the mathematical models representing the visual relationships among the characteristic lines and the background of the image. The computer executable code may store data representing the mathematical models representing the characteristic lines, the visual relationships and the background in the at least one data storage device.
In another aspect of the present invention, the computer executable code causes the computing device to identify multiple characteristic lines of the image and to identify proximities between at least two of the characteristic lines based on a chain of coupled spline segments, which are derived from boundary lines corresponding to each of the at least two characteristic lines and are less than a predetermined length. The computer executable code further causes the computing device to identify crossings between at least two of the characteristic lines. The crossings include elementary crossings based on at least geometric adjacencies among endpoints and spline segments of multiple edges and ridges, first order crossings based on at least a detection of intersecting characteristic lines among the multiple characteristic lines, second order crossings based on at least a detection of identified first order crossings within a predetermined distance of one another, and proximity crossings based on proximities that are less than a predetermined distance in length. A slow-scale background of the digital image is identified based on a signal expansion algorithm to identify multiple points within a background grid and multiple approximating polynomials representative of each of the points within the background grid. Background patches of the digital image are identified based on identification of at least one of closed characteristic lines enclosing an area less than a predetermined size and short characteristic lines having a length less than a predetermined distance. The computing device further determines mathematical models corresponding to the characteristic lines, the proximities, the crossings, the slow-scale background and the background patches, and aggregates the mathematical models to represent the image. The computer executable code stores data representing at least one of the multiple characteristic lines, the proximities, the crossings, the slow-scale background, the background patches and the mathematical models in the at least one data storage device.
In another aspect of the present invention, the computer executable code causes the computing device to define textures. Defining textures includes marking pixels located within the identified patches, constructing polynomials based on a least square approximation of brightness values corresponding to pixels that have not been marked and filtering out selected patches based on size and brightness parameters. The patches not filtered out are aggregated with the constructed polynomials. A difference between a brightness of the aggregated patches and a brightness of the digital image is determined to form a residual background, which is represented via a wavelets system.
In yet another aspect of the present invention, the computer executable code causes the computing device to cover the image with overlapping subsections, each of which has a predetermined size and shape, wherein the size is greater than or equal to a locality size. Each subsection is processed independently, such that the processing includes identifying a multiple characteristic lines in the subsection, identifying visual relationships among the characteristic lines, assembling mathematical models representing the characteristic lines and the visual relationships, and aggregating the mathematical models representing the characteristic lines with the mathematical models representing the visual relationships. For each of the characteristic lines in each overlapping portion of the subsections, interpolation is performed among any non-conforming models that exist for the same characteristic line in the overlapping portion. A background of the subsections is defined, including slow-scale background and background patches, and overlapping portions of the subsections are filtered. The computer executable code also stores data representing at least one of the characteristic lines, the visual relationships among the characteristic lines and the background of the digital image in the at least one data storage device.
Another aspect of the present invention provides a computing apparatus for implementing representation of a composite image from multiple images. The computing apparatus includes a computing device for executing computer readable code, an input device for receiving multiple digital images of a common scene from a common perspective and interfacing with a user, at least one data storage device for storing computer data, and a programming code reading device that reads computer executable code. The computing device is in communication with the input device, the data storage device and the programming code reading device. The computer executable code causes the computing device to represent a first image of the multiple images, which includes identifying multiple characteristic lines, identifying visual relationships among the characteristic lines (e.g., proximities and crossings) and defining background elements (e.g., slow-scale backgrounds and background patches) of the first image. Each of the characteristic lines, the visual relationships and the background elements includes at least a geometric parameter and a brightness parameter. Brightness parameters of each of the remaining images of the multiple images are then sequentially determined by isolating the geometric parameters of the characteristic lines, the visual relationships and the background elements of the first image, and deriving for each remaining image corresponding brightness parameters for the characteristic lines, the visual relationships and the background elements corresponding to the isolated geometric parameters from the first image. The computer executable code further causes data representing at least one of the multiple characteristic lines, the visual relationships among the characteristic lines and the background elements of each of the multiple digital images to be stored in the at least one data storage device.