Video compression is adopted in a variety of fields, such as a digital TV, an Internet streaming video, and a DVD-video, and emerged as a core element of broadcasting and entertainment media. Success of the digital TV and the DVD video is based on MPEG-2 standards announced 15 years ago. Although usefulness of the technique is sufficiently proved, it is an outdated technique now. It is apparent that now is the time to replace further effective and efficient techniques for the MPEG-2 taking advantage of advanced processing capability. Although it is still in controversy over the techniques that can replace the MPEG-2, H.264/AVC (Advanced Video Coding) is one of the most likely techniques considered in the argument.
H.264/AVC is a standard for expressing encoded visual information developed by the Video Coding Experts Group (VCEG), which is a study group of International Telecommunication Union (ITU-T).
H.264/AVC does not separately define a codec (encoder/decoder) and defines only syntaxes for encoded video bitstreams and methods for decoding the bitstreams. An encoded picture is divided into a plurality of macroblocks, and each of the macroblocks has a 16×16 luminance sample and a color difference sample related to the luminance sample. The macroblocks in each picture are arranged in a slice, and the macroblocks are sequentially arranged in the slice in order of scan. The macro blocks are included in an I slice, a P slice or a B slice. The I slice includes only intra macroblocks, and the P slice includes inter macroblocks and intra macroblocks. The B slice includes inter macroblocks and intra macroblocks. The intra macroblock is predicted from reference samples decoded and reconstructed in the current slice using intra prediction.
FIG. 1 is a functional block diagram showing an intra prediction performed by a conventional H.264/AVC encoder.
The H.264/AVC encoder includes two data flow paths of a “forward” path (from left to right) and a “reconstruction” path (from right to left). In the “forward” path, a prediction block is generated from a reconstructed reference block, and an residual pixel block is generated by substracting a video block to be encoded currently from the generated prediction block. In the “reconstruction” path, a reference pixel block of a video block to be encoded next is generated by adding the generated residual pixel block and the prediction block.
The “forward” and “reconstruction” paths will be described in further detail with reference to FIG. 1. First, in case of the “forward” path, a prediction mode determining unit 10 determines an intra prediction mode for an inputted N×N video block of the spatial domain. A prediction block generating unit 20 generates a prediction block for the inputted N×N video block of the spatial domain using a stored reference block, based on the determined intra prediction mode.
An residual block generating unit 30 generates an N×N residual block by substracting the inputted N×N video block of the spatial domain and the generated prediction block from each other. A transform unit 40 transforms the N×N residual block into an N×N residual coefficient block of the transform domain in a block-based transform method. Preferably, a discrete cosine transform (DCT) or an integer DCT method is used among block-based transform methods.
A quantization unit 50 quantizes the N×N residual coefficient block of the transform domain, and an encoding unit 60 generates a bitstream by encoding the quantized N×N residual coefficient block in any one of a prediction encoding method, a variable length encoding method, and an arithmatic coding method.
Meanwhile, in case of the “reconstruction” path, an inverse quantization unit 70 inverse-quantizes the quantized N×N residual coefficient block outputted from the quantization unit 50, and an inverse transform unit 80 generates an N×N residual block of the spatial domain by inverse-transforming the inverse-quantized N×N residual coefficient block. A reference block generating unit 90 reconstructs a reference block used for regenerating a prediction block by adding the N×N residual block and the prediction block.
As described above, the H.264/AVC encoder is specified to compress a video image using an N×N residual coefficient block, and a prediction block used for generating the N×N residual coefficient block is determined through an intra prediction mode of an N×N video block to be encoded. For the H.264/AVC encoder, nine intra prediction modes are defined for a luminance 4×4 video block, and four intra prediction modes are defined for a luminance 16×16 video block. For a color difference video, four intra prediction modes are defined for an 8×8 video block.
The nine intra prediction modes used for the 4×4 video block will be described in further detail with reference to FIG. 2.
1) Prediction Mode 0 (Vertical)                The vertical mode is a prediction mode using four pixels of video block(X) placed above block  to be encoded currently.        Pixel A is filled in four pixels of the first column of the block, and pixel B is filled in four pixels of the second column of the block. Pixels C and D are respectively filled in four pixels of a corresponding column of the block.        
2) Prediction Mode 1 (Horizontal)                The horizontal mode is a prediction mode using four pixels of video block(Z) placed at the left of block  to be encoded currently.        Pixel I is filled in four pixels of the first row of the block, and pixel J is filled in four pixels of the second row of the block. Pixels K and L are respectively filled in four pixels of a corresponding row of the block.        
3) Prediction Mode 2 (DC)                The DC mode is a prediction mode using an average of four pixels I, J, K and L of video block(Z) placed at the left of block  and four pixels A, B, C and D of video block X placed above block  to be encoded currently.        
4) Prediction Mode 3 (Diagonal Down-Left)                The diagonal down-left mode is a prediction mode using four pixels of video block (X) placed above block  and four pixels of video block (Y) placed at the up-right of block  to be encoded currently.        Pixels are filled at an angle of 45 degrees between the down-left and the up-right of the block to be encoded.        
5) Prediction Mode 4 (Diagonal Down-Right)                The diagonal down-right mode is a prediction mode using four pixels of video block (X) placed above block , one pixel Q of video block (S) placed at the up-left of block , and four pixels of video block (Z) placed at the left of block  to be encoded currently.        Pixels are filled in the direction of 45 degrees toward the down-right of the block to be encoded.        
6) Prediction Mode 5 (Vertical-Right)                The vertical-right mode is a prediction mode using four pixels of video block (X) placed above block , one pixel Q of video block (S) placed at the up-left of block , and four pixels of video block (Z) placed at the left of block  to be encoded currently.        Pixels are filled in the direction of 26.6 degrees toward the vertical right of the block to be encoded.(width/height−1/2)        
Prediction Mode 6 (Horizontal-Down Mode)                The horizontal-down mode is a prediction mode using four pixels of video block (X) placed above block , one pixel Q of video block (S) placed at the up-left of block , and four pixels of video block (Z) placed at the left of block  to be encoded currently.        Pixels are filled in the direction of 26.6 degrees toward the horizontal down of the block to be encoded.        
8) Prediction Mode 7 (Vertical-Left)                The vertical-left mode is a prediction mode using four pixels of video block (X) placed above block  and one pixel E of video block (Y) placed at the up-right of block  to be encoded currently.        Pixels are filled in the direction of 26.6 degrees toward the vertical left of the block to be encoded.        
9) Prediction Mode 8 (Horizontal-Up)                The horizontal-up mode is a prediction mode using four pixels of video block (Z) placed at the left of block  to be encoded currently.        Pixels are interpolated in the direction of 26.6 degrees toward the horizontal up of the block, to be encoded.        
In the meantime, a multimedia terminal, such as a cellular phone, a digital TV or the like, searches for video data stored in the multimedia terminal using a small size image (hereinafter, referred to as a ‘thumbnail image’) in order to preview the stored image.
FIG. 3 is a view showing an example of a thumbnail image used in a cellular phone. Describing the thumbnail image shown in FIG. 3, a large number of images or moving images stored in the cellular phone are previously displayed on a display unit in the form of a thumbnail image. If a user searches for an image or a moving image desired to play back through the displayed thumbnail images and selects a certain image from the searched images or moving images, the selected image is played back in the original size of the image.
Conventional methods for generating a thumbnail image from an original video image can be largely divided into two types. The first method generates a thumbnail image for a video image by down-sampling the video image of the spatial domain. The second method generates a thumbnail image of an original video image by extracting only a DC coefficient from each video coefficient block of the transform domain constituting the video image. The DC coefficient existing at the up-left of each video coefficient block of the transform domain is an average value for each video pixel block of the spatial domain, and the video image generated by extracting only the DC coefficient is the same as the image generated by down-sampling the original video image by 1/N.