1. Field of Art
The embodiments disclosed herein generally relate to the field of video compression, and more specifically, to using object decomposition to improve the selection of dictionary predictor entries in example-based compression.
2. Background of the Invention
Many current visual compression techniques rely on an encoder to predict the structure of an image or video based on another image, and communicate the predictor-selection information to a decoder. The decoder reconstructs the image using the predictor-selector information. Typically, the predictor-selection information is combined with a residual signal to compensate for differences between the reconstructed image and the original image, thereby bringing each part of the reconstructed image into closer alignment with the original image.
For example, the H.264 standard for video compression predicts the structure of each macroblock of a given frame using motion fields from other reference frames. By limiting the number of reference frames (typically 1-8 frames) and the number of distinct offsets available within each reference frame (typically 100-1,000 offsets), the encoder can search through available predictions to select one that will be best, in terms of compression rate and rate-distortion.
In example-based compression, the compression technique also uses predictor-selection and transmission to encode an image or video. However, compared to H.264 and other modern video codecs, in example-based compression the number of available predictors available to the encoder is much larger than the number of predictors available to other video codecs. Typically, a dictionary used in example-based compression may comprise a massive collection of predictors on the order of millions rather than thousands. Due to the size of the dictionary used in example-based compression, the speed with which current techniques select the best predictor from the dictionary needs improvement.
Furthermore, in example-based compression, the quality of the encoding of an image region is highly dependent on the complexity of the region and the availability of a similar region within the example-based dictionary. Videos of dynamic scenes containing multiple objects occluding each other and exhibiting rapid motions can generate complex visuals with specific characteristics. Due to the complexity of the visuals within these videos, the method in which to select dictionary predictor entries to encode the visuals needs improvement.