Generally, the amount of data used to represent video data is very large. Accordingly, an apparatus handling such video data encodes the video data by using high-efficiency coding before transmitting the video data to another apparatus or before storing the video data in a storage device. “High-efficiency coding” refers to a coding process that converts a certain data stream into another data stream by compressing the amount of data of the data stream.
One known coding method employed in high-efficiency coding for video data is the intra-picture (intra-predictive) coding. This coding method exploits high spatial correlation existing within video data, and encodes a picture without using encoded images of other pictures. A picture encoded by the intra-picture predictive coding method can be decoded by using only information from itself.
Another known coding method employed in high-efficiency coding is the inter-picture (inter-predictive) coding. This coding method exploits the property that video data has high temporal correlation. Generally, in video data, a picture at a given instant in time and a picture that follows it are often highly similar to each other. The inter-predictive coding exploits this property of the video data. Generally, a video encoding apparatus encodes an original picture by dividing it into a plurality of coding blocks. The video encoding apparatus obtains a reference picture by decoding a previously encoded picture, searches the reference picture on a block-by-block basis for a region that is similar to the coding block, and calculates a prediction error image representing the difference between the reference region and the coding block and thereby removes temporal redundancy. The video encoding apparatus achieves a high compression ratio by encoding the prediction error image and the motion vector information indicating the location of the reference region. Generally, the inter-predictive coding provides higher compression efficiency than the intra-predictive coding.
Typical video coding schemes that employ the above described predictive coding methods and that are widely used today include the Moving Picture Experts Group Phase 2 (MPEG-2), MPEG-4, and H.264 MPEG-4 Advanced Video Coding (H.264 MPEG-4 AVC) defined by the International Standardization Organization/International Electrotechnical Commission (ISO/IEC). In these coding schemes, which predictive coding method, the intra-predictive coding or the inter-predictive coding, was selected to encode a picture is explicitly indicated, for example, in a video stream containing the encoded video data. The selected predictive coding method is referred to as the coding mode. When the selected coding mode is the intra-predictive coding mode, the video encoding apparatus can select only the intra-predictive coding method as the prediction method to be actually used. On the other hand, when the selected coding mode is the inter-predictive coding mode, the video encoding apparatus can select the inter-predictive coding method as the prediction method to be actually used. Further, when the inter-predictive coding method is selected, the video encoding apparatus can select any one vector mode from among a plurality of vector modes that differ in the method of encoding motion vectors.
In the above video coding schemes, an I picture, P picture, and B picture are defined. The I picture is a picture that is encoded using only information within the picture. The P picture is a picture that is inter-encoded using information of one of previously encoded pictures. The B picture is a picture that is bidirectionally predictive encoded using information of two of previously encoded pictures. The time directions pointing to the two reference pictures to which the B picture refers are designated L0 and L1, respectively. One of the two reference pictures to which the B picture refers may be a picture that is earlier in time than the B picture, and the other may be a picture that is later in time than the B picture. In this case, the direction L0 is a direction that points forward in time from the picture to be encoded, i.e., the B picture, and the direction L1 is a direction that points backward in time from the picture to be encoded. Alternatively, the two reference pictures may be pictures both of which are earlier in time than the B picture. In this case, the directions L0 and L1 are directions that both point forward in time from the picture to be encoded. Further, the two reference pictures may be pictures both of which are later in time than the B picture. In this case, the directions L0 and L1 are directions that both point backward in time from the picture to be encoded.
In the most recently developed High Efficiency Video Coding (HEVC), the method of dividing a picture into blocks differs from the existing coding schemes. FIG. 1 is a diagram illustrating one example of how a picture is divided according to HEVC.
As illustrated in FIG. 1, the picture 100 is divided into coding blocks referred to as Coding Tree Units (CTUs), and the CTUs 101 are encoded in raster scan order. The size of each CTU 101 is selectable from among sizes of 64×64 to 16×16 pixels. However, the size of each CTU 101 is the same within the same sequence unit.
Each CTU 101 is further divided into a plurality of Coding Units (CUs) 102 using a quadtree structure. The CUs 102 in each CTU 101 are encoded in Z scan order. The size of each CU 102 is variable and is selected from among CU partitioning modes of 8×8 to 64×64 pixels. The CU 102 is the unit at which a decision is made as to whether to select the intra-predictive coding mode or the inter-predictive coding mode as the coding mode. Each CU 102 is partitioned into Prediction Units (PUs) 103 or Transform Units (TUs) 104 for processing. The PU 103 is the unit at which the prediction is performed in accordance with the selected coding mode. For example, in the intra-predictive coding mode, the PU 103 is the unit at which a prediction mode is applied and, in the inter-predictive coding mode, the PU 103 is the unit at which motion compensation is performed. The size of the PU 103 is selectable from among PU partitioning modes PartMode=2N×2N, N×N, 2N×N, N×2N, 2N×U, 2N×nD, nR×2N, and nL×2N. On the other hand, the TU 104 is the orthogonal transform unit, and the size of the TU 104 is selected from among sizes of 4×4 to 32×32 pixels. The TUs 104 are formed by partitioning using a quadtree structure and are processed in Z scan order. For convenience, in the present specification, the prediction unit will be referred to as the first sub-block, and the coding unit as the second sub-block.
Generally, the amount of computation needed for encoding video data increases as the number of pixels contained in a picture increases. In view of this, a study is being conducted on reducing the time needed for encoding by dividing each picture contained in video into a plurality of regions and by encoding each region using a separate encoder.
In one known method of dividing a picture into a plurality of regions, the picture is divided into basic units referred to as slices. In this case, the encoders encode the input slices independently of one another by regarding each slice as one picture, and the encoded data output from the respective encoders are multiplexed together for output. By thus using different encoders for different slices, the encoders can each be constructed using a processing unit having a low processing capability; this may serve, for example, to reduce the production cost of the encoding apparatus as a whole.
In a system that uses a plurality of encoders to encode respectively different regions, encoded data of the entire picture encoded in the past is stored as shared information accessible from the respective encoders. In this case, in order to reduce the hardware resources needed, a study has been conducted on reducing the memory capacity for temporarily storing the shared information by reducing the amount of shared information data (for example, refer to Japanese Laid-open Patent Publication Nos. H07-135654, H10-276437, and 2000-165883).