Motion compensation, intra-frame prediction and transform are operations employed in video coding to exploit spatio-temporal redundancy based on the mean squared error (MSE) criterion. However, current video coding techniques are limited in that they mainly focus on exploiting pixel-wise redundancies. These techniques generally attempt to achieve high compression performance by using more and more modes to deal with regions of different properties in image and video coding. Consequently, intensive computational efforts are required to perform mode selection subject to the principle of rate-distortion optimization. Furthermore, it is a generally accepted that minimizing overall pixel-wise distortion, such as mean square error (MSE), does not guarantee good perceptual quality of reconstructed visual objects, especially in low bit-rate scenarios. As a result, such techniques typically inefficiently code texture regions with many details, e.g. water and grass, etc.