1. Field of the Invention
The present invention is applicable to encoding and decoding of video data according to H.264/MPEG-4AVC (ITU-T Rec. H.264 ISO/IEC 14496-10 AVC) standard, for example. The present invention reduces the capacity of a cache memory in a configuration of simultaneously processing image data with plural arithmetic processing means in parallel for encoding processing and decoding processing by sequentially and cyclically assigning slices to plural arithmetic processing sections, simultaneously encoding and decoding image data in parallel by plural arithmetic processing sections, and setting the processing of each slice to establish a relationship in which a reference macroblock of a macroblock in processing in each slice may partly overlap with a reference macroblock of a macroblock in the immediately preceding slice.
2. Background Art
In related art, in various kinds of video equipment, encoding processing and decoding processing are performed on image data of moving images with H.264/MPEG-4AVC (hereinafter, referred to as H.264/AVC), WMV9 (Windows Media Video 9), MPEG-4 (ISO/IEC14496 Information Technology-Generic Coding of Audio-Visual Object), MPEG-2 (ISO/IEC 13818-2 International Standard MPEG-2 Video), MPEG-1 (ISO/IEC 11172-2 International Standard MPEG-1 Video), etc. In these encoding processing and decoding processing, macroblocks are sequentially processed in the raster scan order.
That is, as shown in FIGS. 41A to 41D by the case where a video signal is so-called 4:2:0 as an example, in this type of encoding processing, brightness signal Y and color-difference signals Cr, Cb are divided into 16-pixel×16-pixel and 8-pixel×8-pixel macroblocks, respectively. For the brightness signal Y, discrete cosine transform processing is performed on each 8-pixel×8-pixel block formed by dividing one macroblock into halves in the horizontal direction and the vertical direction, respectively. For the color-difference signals Cr, Cb, discrete cosine transform processing is performed on each macroblock. In H.264/AVC, orthogonal transform processing and discrete Hadamard transform processing are performed on each 4-pixel×4-pixel block formed by further dividing the respective blocks into halves. In this type of encoding processing, quantizing processing and variable length coding processing are performed on the respective coefficient data as discrete cosine transform processing results.
Accordingly, in the type of encoding processing and decoding processing, as shown in FIG. 42A, each macroblock (MB) is identified by a two-dimensional address, a horizontal and vertical address (X,Y). Further, in the type of processing, as shown in FIG. 42B, the horizontal and vertical address (X,Y) is converted into a one-dimensional address for accessing a memory, and image data of the respective macroblocks held in the memory are sequentially processed.
In the processing, transmission efficiency is improved by referring to the processing result of an adjacent macroblock. Specifically, intra prediction of MPEG-1, 2, as shown in FIG. 43, referring to the processing result of an adjacent macroblock (X−1,Y) at the scan start end side of the same slice, the subsequent macroblock (X,Y) is processed. In FIG. 43 and the subsequent drawings, the reference relationships are shown by arrows. As below, the macroblock to be referred to is called a reference macroblock. Accordingly, in the example of FIG. 43, the macroblock (X−1,Y) is the reference macroblock of the macroblock (X,Y). Here, the slice is a unit of processing of slice layers and formed by plural macroblocks continuing in the horizontal direction.
In intra prediction of MPEG-4, as shown in FIG. 44, an adjacent macroblock (X−1,Y) at the scan start end side of the same slice, an adjacent macroblock (X,Y−1) directly above of the immediately preceding slice, and an adjacent macroblock (X−1,Y−1) at the scan start end side of the slice containing the adjacent macroblock (X,Y−1) directly above are set as reference macroblocks of the macroblock (X,Y), and the macroblock (X,Y) is processed by referring to the processing result of the macroblock (X−1,Y), (X,Y−1), or (X−1,Y−1).
In motion vector prediction of MPEG-4, as shown in FIG. 45, an adjacent macroblock (X-1,Y) at the scan start end side of the same slice, an adjacent macroblock (X,Y−1) directly above of the immediately preceding slice, and the adjacent macroblock (X+1,Y−1) at the scan termination end side of the slice containing the adjacent macroblock (X,Y−1) directly above are set as reference macroblocks of the macroblock (X,Y), and the motion vector of the macroblock (X,Y) is predicted by referring to the motion vector of the macroblock (X−1,Y), (X,Y−1), or (X+1,Y−1).
In intra prediction of H.264/AVC, as shown in FIG. 46, an adjacent macroblock (X−1,Y) at the scan start end side of the same slice, an adjacent macroblock (X,Y−1) directly above of the immediately preceding slice, an adjacent macroblock (X−1,Y−1) at the scan start end side of the slice containing the adjacent macroblock (X,Y−1) directly above, and an adjacent macroblock (X+1,Y−1) at the scan termination end side of the slice containing the adjacent macroblock (X,Y−1) directly above are set as reference macroblocks of the macroblock (X,Y), and the macroblock (X,Y) is processed by referring to the processing result of the macroblock (X−1,Y), (X,Y−1), (X−1,Y−1), or (X+1,Y−1).
In motion vector prediction of H.264/AVC, as shown in FIG. 47, similarly to the motion vector prediction of MPEG-4, the adjacent macroblocks (X,Y−1), (X+1,Y−1), and (X−1,Y) are set as reference macroblocks of the macroblock (X,Y), and the motion vector is processed by referring to the motion vector of the macroblock (X,Y−1), (X+1,Y−1), or (X−1,Y).
In deblocking filter processing of H.264/AVC, as shown in FIG. 48, an adjacent macroblock (X−1,Y) at the scan start end side of the same slice and an adjacent macroblock (X,Y−1) directly above of the immediately preceding slice are set as reference macroblocks of the macroblock (X,Y), and the macroblock (X,Y) is processed by referring to the processing result of the macroblock (X,Y−1) or (X−1,Y).
In the above described encoding and decoding processing, encoding processing and decoding processing may be performed by arithmetic processing of an arithmetic processing means such as a central processing unit.
In a data processing system using the arithmetic processing means, high speed processing is realized using a cache memory.
That is, as shown in FIG. 49, in a data processing system 1 using the arithmetic processing means, a cache memory 2 is formed by a memory such as an SRAM that is accessible at a high speed, and a main memory 4 is formed by a memory with lower power consumption than that of the cache memory 2 though the memory is more difficult in high-speed access than the cache memory 2. Further, data containing commands of a data processing means 3 are stored in the main memory 4, and the commands and part of the data stored in the main memory 4 are loaded and held in the cache memory 2. In the cache memory 2, TAG information for management of addresses of the respective data is set and commands and data are stored.
In the data processing system 1, when using the same command and data again, the data processing means 3 first accesses the cache memory 2 as shown by arrow A and searches for desired command and data. When the target command and data are present in the cache memory 2, the device takes out and uses the command and data recorded in the cache memory 2. When the target command and data are not present in the cache memory 2, the device takes out the target command and data from the main memory 4 and uses them as shown by arrow B, and stores the command and data in the cache memory 2. The cache memory 2 may be configured as software within the main memory for data management at a higher speed.
Regarding the configuration for performing encoding processing and decoding processing using an arithmetic processing means, for example, a scheme to reduce the total number of cycles of loading image data as objects of processing from a memory is proposed in JP-A-2006-42364 (patent document 1). Further, a scheme to make the speed of processing image data higher using a cache memory is proposed in JP-A-2000-115806 (patent document 2).
When encoding processing and decoding processing are performed by simultaneously processing image data with plural arithmetic processing means in parallel, the processing speed can be made higher compared to the case of processing image data with one arithmetic processing means. Further, it is conceivable that, when a cache memory is used in a configuration of performing encoding processing and decoding processing by simultaneously processing image data with plural arithmetic processing means in parallel, the processing speed can be made even higher. In this case, if the capacity of the cache memory can be reduced by effectively utilizing the configuration of performing encoding processing and decoding processing by simultaneously processing image data with plural arithmetic processing means in parallel, the circuit size can be reduced and the power consumption can be reduced.