1. Field of the Invention
The present invention relates to the field of image and moving video representation, encoding, and/or compressing.
2. Description of the Prior Art
Analysis, processing, compressing, and channel encoding of digital images and motion video play an important role in nearly all disciplines. Raw images—commonly represented on a pixel basis—and videos—commonly represented as a temporal sequence of frames of images—possess a number of limitations which prevent their use in most practical applications, including their unwieldy size, lack of perceptual or semantic representation, and failure to gracefully degrade when information is lost.
The prior art generally seeks to encode images and videos using one of two representations. Functional decomposition representation techniques analyze the raw two-dimensional arrays of data from a discrete mathematical basis, transforming the data into different mathematical domains (such as frequency using a windowed Fourier transform, or wavelets), where quantization processes can be applied to omit or decimate information in a manner which may a priori be thought to preserve some human-recognizable aspects of the image or video. JPEG and MPEG-4 Part 10 (AVC) are examples of this. These methods do not directly analyze the image for semantic information, and in doing so, subject the rendered images or videos to artificial pixelation or blurring under higher compression or loss rates, or at differing image scales, belying the implementations' fundamental lack of information of the perceptually distinct parts of the images or video. Motion video representation techniques of this kind use motion compensation to encode and compress the motion of similar visual content between frames. Motion compensation retains the pixel-based aspect of the underlying still-image representation, providing a series of instructions to move and adapt the contents of previous or future rendered frames into the current one. Pixel-based, rather than semantic-based, techniques lose opportunities for efficiency and operation at differing scales and loss rates by being unable to animate the semantic shapes of the image.
Formal decomposition representation techniques in the prior art attempt to apply heuristics to detect or adapt normally continuous mathematical forms from the underlying raw media, which then are encoded. However, prior art is severely lacking in the ability to do this successfully across a variety of images and compression environments.
Vectorization techniques, such as provided within the Adobe family of products “Illustrator”, “Shockwave”, and “Flash”, as well as standards-based vector representations, including two-dimensional and three-dimensional packages, represent the underlying content through mathematical primitive geometric shapes and shading instructions. The implementations, however, are not methods for producing realistic images or videos from raw sources, and are thus limited to use for artistic, often cartoon-like, shapes and shadings created by hand or from automated “tracing” tools operated with human supervision, tools whose function is to outline, but not realistically represent, raw pixel based subject matter. These methods do not provide general methods for lossy compression of the vectorized information, or for producing reasonably realistic or semantically plausible images or videos under significant scaling or information loss.
A method for formal image representation and processing is described by James H. Elder and Rick M. Goldberg in “Image Editing in the Contour Domain,” IEEE (1998), based on edge capturing, together with the “blur scale” and the brightness values. This method extracts an edge representation of the image; this “edge representation” of the prior art requires that image information—specifically, the intensities—of the image must be retained for both sides of the edge as intrinsic properties of the “edge”, thus preventing the edge from serving as simply a boundary between semantic regions. This non-semantic approach to edges prevents the prior art from accurately representing the color and texture between the edges, as noted by the prior art itself, which requires a blurring process to remove cartoon-like artifacts. Stated differently, the areas of the image between the edges requires long-range extrapolation of the intensities or colors bound to the line. However, it is not the case in general that long-range extrapolation of local properties of an image around an edge represents the properties of images away from edges. Thus, the prior art fails to adequately represent the image for encoding or compressing in the general case. Therefore, the edge representation does not lend itself to compression or reasonable recovery of semantic information under scaling or loss, as partial omission of edge data, even when edge topology is preserved. The resulting increase in uncertainty of the intensity or color of the image around that edge, and not just uncertainty of the placement of the edge itself, makes adequate loss recovery difficult, if not impossible.
A further method for image representation and processing is described within MPEG-4 Part 19, “Synthesized Texture Streams”, based on the concept of representing the image by “characteristic lines”, touch-ups known as “patches”, and possible encoded residuals. The characteristic lines, as described within the prior art, are lines within the image that have similar “cross-sections”, or brightness patterns viewed when taken across or perpendicular to the characteristic lines. The prior art requires that the characteristic lines be tied with this brightness (or similar) information recovered from the cross section. The representation of the image by characteristic lines with line color profiles derived from the cross section suffers from many of the same problems as contour domain analysis, including non-semantic representation and inadequate or incorrect recreation of the color or intensity of an image away from the characteristic lines. The authors of this prior art incorrectly conclude that reasonable image representation is impossible without such brightness information bound to or associated with the characteristic lines. However, binding brightness information to a characteristic line is artificial and limiting, such as in the case of moving video. In a moving video, a characteristic line should be created from the edge of one object as it partially obscures another. When the foremost object moves, the brightness information of the image, and generating subject matter, along the characteristic line on the side of the obscured object would change in ways that would likely have that information in one frame have nothing to do with similar information in another, as various parts of the obscured object come into view. As an example, the shoulder of a person walking in the foreground of the image may be adjacent to, at first, a building, then a tree, then the sky, as the person moves through a scene. The image information of the building, tree, and sky are not related, and so that half of the cross section of the characteristic line would not have any meaningful frame-to-frame correlation, thus preventing efficient compression or even cross-frame line correlation. This is another reason why binding brightness to characteristic lines is inadequate. Furthermore, the characteristic line definition and the processes the prior art depends on to detect such lines, by determining the cross-sectional structure of adjacent ridges or edges, adds both unnecessary and constraining complexity to the process and any implementation thereof, as well as reduces the potential efficiency of the encoding or subsequent compression. Confusion over whether adjacent structures that end up combined into a characteristic line—and yet belong to separate and distinct objects in the subject matter of the image that just happen to be adjacent for that one image or section thereof—also lead to an inability to efficiently represent the motion or separation of the structures, can lead to a “surface tension” effect, where accidental adjacencies become combined into structures not present in the subject matter and appear to the observer as warping or combining of what should be distinct areas of the image. Finally, such combinations suffer from the confusion over scale and distance, as what constitutes adjacency is highly dependent on the subject matter and the scale of the image, and therefore requires human observation or a priori information to be provided to the process that is not typically known or expected in image representation methods.
Recognizing that long-distance extrapolation of characteristic line information does not provide the correct colors for the regions between the lines, the prior art introduces additional techniques to compensate, but which fail to provide an adequate solution. One attempted compensation technique provided lies in the notion of “patches”, either as elliptical primitives as described in Part 19, or as a general concept of a compact geometric shape. Patches are an ad hoc scheme to represent texture by using a large amount of simply-behaving geometric primitives, and can clearly be seen to not provide a significant benefit over directly storing the pixel values (or residuals) in all but the most contrived images. For example, if each pixel intensity were chosen from a uniformly distributed random variable, the patch method would require one patch per pixel to represent this information, whereas the amount of actual perceptual information contained in such a shading is minimal, describable simply by the bounds of the uniform distribution. Furthermore, patch representations do not offer a significant semantic representation of the varying colors or shades of an image between edges or distinct divisions. Together, a proper patch representation, as envisioned by the prior art, both stores the image representation inefficiently and fails to capture semantic information about the texture between the edges. This fails to provide a representation that can be used under increasing loss rates, compression rates, or varying scales. For rendering a represented image, the prior art relies on “just so” combinations of weighting factors and techniques as a part of aggregation and recreation. The confusion in the prior art of whether a part of the image should get its brightness from the influence of the lines, the background, or texture lead to reconstruction mechanisms that may have a high degree of uncertainty about whether the recreated image fairly represents the subject matter, leading to inefficiency in both creation, representation, and rendering for images. In these cases, the amount of information remaining in the residual is nearly as significant as the information present in the original image, and so effective compression may be reduced.
These failings extend to U.S. Pat. No. 6,801,210 (Yomdin, issued Oct. 5, 2004) and U.S. Pat. No. 6,760,483 (Yomdin, issued Jul. 6, 2004), which anticipate MPEG-4 Part 19. By combining lines with brightness information, using patches to make up the inaccuracies of the reconstruction, and not directly using the lines to bound regions throughout the reconstruction process, the prior art fails to produce an image that reasonably represents the semantic content of the original image across differing scales, compression levels, and loss rates.
Other methods in the prior art attempt to divide the image into cells of limited size, within which cells analysis occurs. This approach fails to adequately or efficiently represent the longer-scale semantic information present in the image, and thus may reduce compressibility and sacrifice quality on the reconstruction. The prior art also suffers from poor detection and extraction of semantic features of an image, often being subject to confusion over whether a relatively faster change in image information represents a line, edge, ridge, or background texture. The advantages of a general purpose feature detector are not present in the prior art, which place too heavy a dependence of the processes themselves on characteristic lines, ridges, or couplings, with those dependencies necessary for the prior art to function and severely constraining on the art itself.
The present invention overcomes the problems associated with the prior art, as described below.