When video data is encoded, encoding methods based on the standards such as MPEG (Moving Picture Experts Group)-2, MPEG-4, and H.264/MPEG-4 AVC (Advanced Video Coding) are used to increase compression efficiency. In those encoding methods, an intra-coded picture (I picture) encoded without prediction from other pictures, or an inter-picture prediction coded picture (P picture or B picture) encoded by use of prediction from a past picture or past and future pictures in image display order relative to a target picture to be encoded can be used. Note that a “picture” corresponds to a field in a case where a video image of an interlace method is encoded per field unit, while it corresponds to a frame in a case of a video image of a non-interlace method (a progressive method). Further, when video data is encoded, two fields can be put together in one frame in a video image of the interlace method to be encoded per frame unit, and in that case, a “picture” corresponds to a frame. Hereinafter, the inter-picture prediction coding may be referred to as inter-coding.
The intra-coded picture is used for encoding of the first picture of a video image, and further is used for the purpose of restoring, to a normal image, a disturbed image which occurs based on a transmission error caused when encoded data of the video image is transmitted, of allowing reproduction of a normal image when a video image is reproduced in the middle thereof, or the like purpose.
An encoding device of a side of transmitting encoded data and a decoding device of a side of receiving encoded data are generally provided with a buffer memory having a suitable capacity for accumulating received encoded data. The buffer memory is called a VBV (Video Buffering Verifier) buffer in MPEG-2 and MPEG-4 part.2 while being called CPB (Coded Picture Buffer) in MPEG-4 AVC. Further, the capacity of a buffer memory is represented as (transmission rate×0.5 seconds), for example. Because the compression efficiency of an intra-coded picture is lower than the compression efficiency of an inter-coded picture, a coding amount of the intra-coded picture is larger than a coding amount of the inter-coded picture. Accordingly, when the intra-coded picture and the inter-coded picture are both included, such a state occurs that the coding amount fluctuates per picture. The buffer memory serves as the role to absorb the influence of the fluctuation in the coding amount.
Further, on the occasion of the use of a B picture, it is demanded that an I picture and a P picture which come later than the B picture in display order should be decoded earlier than the B picture, and accordingly, a sequential order of pictures after encoding is changed with respect to an input order of each image constituting a video image. That is, the reordering of pictures is performed. FIG. 10 is an explanatory view to describe the reordering of pictures. When a video image is input into an encoding device in the order exemplified in FIG. 10(A), the order of pictures in a stream of encoded data to be transmitted is different from the order shown in FIG. 10(A), as exemplified in FIG. 10(B). Note that, in FIGS. 10(A) and (B), “B” indicates a B picture, “I” indicates an I picture, and “P” indicates a P picture. In FIGS. 10(A) and (B), numerals indicate an input order. Further, in FIG. 10(B), a prime mark is attached to I, B, and P, but it is just attached thereto to distinguish differences on data such that I, B, and P in FIG. 10(A) indicate constituents of a video image before encoding, while I, B, and P to which a prime mark is attached in FIG. 10(B) indicate constituents of a bit stream after encoding.
Transmitted encoded data temporarily stays in a buffer memory, and further, as shown in FIG. 10, pictures are reordered, thereby causing a delay. That is, with respect to a time point when the video image is input into the encoding device, an output time point of a reproduced video image output from a decoding device delays. Hereinafter, encoding which causes intra-coded pictures and inter-coded pictures to be included in an encoded video image may be referred to as usual delay encoding. Note that as delays, there are delays in an encoding process and in a decode process and a delay in a transmission path, too, but the following description focuses on a delay due to encoded data temporarily staying in a buffer memory and a delay due to reordering of pictures.
In the usual delay encoding, there are a reference picture, which is a picture referred to by other pictures in inter-picture prediction, and a non-reference picture, which is a picture that is not referred to by other pictures. A reference picture includes an I picture and a P picture, and a non-reference picture includes a B picture. Note that, in MPEG-4 AVC, a reference structure can be layered and the B picture can be referred to. For example, in regard to a picture group input in order of I0, B1, B2, B3, and P4, when I0 is assumed an I picture, P4 is assumed a P picture, and B1, B2, and B3 are assumed B pictures to perform encoding, such a reference structure can be taken that B2 is taken as a reference picture, B2 refers to two pictures of I0 and P4, B1 refers to two pictures of I0 and B2, and B3 refers to two pictures of B2 and P4. In any case, since the non-reference picture is not referred to by other pictures, an error occurring in the non-reference picture does not propagate in other pictures. However, a decrease in image quality of the reference picture has an influence on other pictures.
In view of this, when usual delay encoding is performed in an encoding device, such control is often performed that a quantization level in a picture to be a non-reference picture is increased to restrain an increase of a coding amount after encoding, and a quantization level in a picture to be a reference picture is decreased to perform control of preventing a decrease in image quality (see, for example, Patent Literature (PTL) 1).
Note that, in FIG. 11, “B” indicates a B picture, “I” indicates an I picture, and “P” indicates a P picture. In FIG. 11, numerals indicate an input order. Further, a picture indicated by an arrow corresponds to a reference picture.
As described above, a delay occurs when usual delay encoding is used, but in a case where bidirectional communication using a video image is implemented or the like, it is preferable to prevent the delay while maintaining moderate compression efficiency. In order to decrease the capacity of the buffer memory, the delay can be restrained by substantially equalizing coding amounts of respective pictures and by restraining the reordering of pictures. In order not to perform the reordering of pictures, the inter-coded picture may not be used, or when the inter-coded picture is used, only one-way prediction may be used.
In order to decrease unevenness in the coding amounts of respective pictures and not to perform the reordering of pictures, as well as to maintain moderate compression efficiency and not to ruin an effect of refresh, it is preferable to use only the intra-coded picture without using the inter-coded picture, in a case where a bandwidth of a transmission path is wide. In the meantime, in a case where the bandwidth of the transmission path is narrow, slice refresh is used, for example. The slice refresh is a technique to refresh a screen while using only the inter-coded picture without using the intra-coded picture. The technique is also called intra slice refresh.
FIG. 12 is an explanatory view showing a state where an image (a screen) is refreshed by an I picture. In FIG. 12, each of areas Rj−1, Rj, and Rj+1 surrounded by a rectangle of a dashed line shows a prediction allowance range. As shown in FIG. 12, error propagation is limited in the prediction allowance range by restraining reference beyond the prediction allowance range.
FIG. 13 is an explanatory view to describe the slice refresh. The slice refresh does not refresh a whole image (one screen) by the I picture as exemplified in FIG. 12, but performs such refresh as shown in FIG. 13 in which a part of a slice (a set of one or several belt-shaped macroblocks) in a picture is set as an intra-coding region, the slice of the intra coding area in each of consecutive pictures is moved, and when predetermined time passes, the slice of the intra coding area goes through the whole screen (see, for example, PTL 2). In FIG. 13, each of areas Rj−2/Rj−1, Rj, and Rj+1 surrounded by a dashed line shows a prediction allowance range. Note that, in the present description and drawings, a subregion constituting an image, such as a “slice,” may be expressed as a “segment.” Particularly, a target segment (a refreshed area) to be refreshed may be expressed as a “refreshed segment”. Further, hereinafter, when the expression a “segment” is used, it refers to either a set of macroblocks having a given shape, which is not limited to the belt-shape, or one macroblock. For example, in a case where the number of macroblocks constituting a picture is n and the refresh is performed so that the intra-coding area goes through the whole screen when N pieces of pictures have passed, a subregion in a given picture constituted by N/n pieces of macroblocks may be used as a refreshed segment. Further, in FIG. 13, an “intra-coding segment” corresponds to a refreshed area. A “ordinary encoding segment” is an area in which intra-coding or inter-coding is used.
Further, as shown in an explanatory view of FIG. 14, each of prediction allowance ranges corresponding to areas Rj−2, Rj−1, Rj, and Rj+1 surrounded by dashed lines are defined, but in each of the prediction allowance ranges, refresh can be performed even in a case where inter-coding can be also used without defining a slice of an intra-coding area.
However, generally, the encoding efficiency is higher when intra-coding is performed in a prediction allowance range, and therefore the refresh by the intra-coding segment exemplified in FIG. 13 is often used. Hereinafter, the refresh exemplified in FIGS. 13 and 14 are referred to as gradual refresh.
In a television broadcast system, as well as a case to provide to audiences video and audio recorded in a storage medium, there is also a case where captured video and obtained audio may be provided to audiences in real time, such as the cases of providing sports programs and news reports. When a shooting location is away from a broadcast station, video and audio are transmitted to the broadcasting station from the shooting location through a plurality of relay stations (see, for example, PTL 3). After that, the video and audio are broadcasted from the broadcast station to reception equipment which audiences have. Hereinafter, a shooting location may be referred to as a video acquisition spot.
In a case of digital television broadcasting, generally, video data encoded in an imaging device placed at a video acquisition spot is transmitted to a broadcast station. Then, video and audio are transmitted as digital data from the broadcasting station to reception equipment which audiences have. Further, in some cases, received encoded data are decoded in a relay station and then re-encoded to be transmitted to the broadcasting station.