A video has a large amount of information, therefore it is necessary to perform compression encoding of the video with high efficiency when the video is recorded and transmitted as digital data. In order to realize high-efficiency compression, various elemental technologies are used for video compression encoding.
There is a predictive coding technology as one of the elemental technologies for video compression encoding. The predictive coding technology is a technology for, when respective pixels in a video are sequentially encoded, generating a predicted value of a pixel to be currently encoded by using one or a plurality of temporally and/or spatially neighboring pixels and encoding a differential signal between an original signal and a predicted signal, instead of directly encoding the original signal. Generally, since each of pixels in a video has a high correlation with temporally and/or spatially neighboring pixels, high-efficiency compression can be performed by using the predictive coding technology.
One of predictive coding technologies is called intra-picture predictive coding which performs prediction by referring a group of pixels included in the same picture, and another technology is called inter-picture predictive coding which performs prediction by referring a group of pixels included in a different picture (referred to as a reference picture). Generally, since there is a certain motion in a video, the inter-picture predictive coding technology is commonly used with a motion compensation technology for increasing prediction efficiency by using spatial displacement information. Note that a “picture” represents a processing unit of a screen, corresponds to a field in a case where a video image of the interlace format is encoded per field unit, while it corresponds to a frame in a case where a video image of the non-interlace format (the progressive format) is encoded and a video image of the interlace format is encoded per frame unit.
Generally, there is a higher correlation in a video, in particular, temporally than spatially. Therefore, the inter-picture predictive coding technology of predictive coding technologies can achieve high-efficiency compression in particular. On the other hand, there is a case when a temporal correlation is reduced remarkably in an entire or partial screen on the basis of movement in a position relationship between a foreground and a background or scene change by editing. Therefore, general video coding formats including ISO/IEC 14496-10 advanced video coding (AVC) described in Non Patent Literature 1 implement an adaptive predictive coding method for encoding partial images as units obtained by subdividing a picture by adaptively switching inter-picture predictive coding and intra-picture predictive coding, or by adaptively switching one of inter-picture predictive coding, intra-picture predictive coding, and intra-picture coding without prediction. A size of unit varies depending on video coding formats, but a rectangular area (referred to as a macroblock) including 16 pixels in a vertical direction and 16 pixels in a horizontal direction is typically used. Hereinafter, the intra-picture coding may be referred to as intra coding.
In the case of using the adaptive predictive coding method, due to a difference in correlation between pictures, a difference in prediction efficiency is caused. Consequently, a difference in compression efficiency is caused. As a result, a situation occurs in which a code amount varies in each picture. In addition, the variation occurs regardless of a transmission band of a transmission path through which a data group (referred to as a bitstream) obtained by compression encoding is transmitted. Therefore, a general video encoding device and a general video decoding device include a buffer memory for storing a bitstream so as to absorb a variation in the code amount and ensure transmission in a predetermined transmission band. The buffer memory is called a coded picture buffer (CPB) in the AVC standard. The capacity of the buffer memory is greatly different depending on characteristics of a system to which an encoding device and a decoding device are applied, but a capacity corresponding to 0.5 second multiplied by a transmission bit rate is typically provided, for example.
In various cases including channel switching of television broadcasts or special replay of storage type contents, it is required to start decoding in the middle of a bitstream, i.e. random access, at the time of recoding and transmitting a video. When decoding is started in the middle of a bitstream generated by the inter-picture predictive coding, a video cannot be normally decoded because of the inter-picture prediction process based on a non-decoded reference picture. Therefore, a general video encoding device implementing the adaptive predictive coding performs encoding control (referred to as refresh) for appropriately inserting intra coding so as to obtain a normal decoded video within a predetermined period of time even when decoding is started in the middle of a bitstream.
One of refresh methods is instantaneous refresh for inserting a picture (referred to as an intra-coded picture) where the whole picture is encoded by intra-picture coding. FIG. 14 illustrates an example of operation of instantaneous refresh. In the drawing, an area indicated by a dashed border is a group of pictures which is a unit of refresh control, and the group is called a refreshing group of pictures (RGOP). In the operation example illustrated in FIG. 14, the video encoding device inserts an intra-coded picture every four pictures, and controls such that three pictures subsequent to the intra-coded picture do not use a picture encoded prior to the immediately preceding intra-coded picture as a reference picture, that is, the inter-picture prediction is performed within the refreshing group of pictures. In this manner, even when starting decoding in the middle of a bitstream, the decoding device can obtain a correct decoded image by starting a decoding process from a leading intra-picture of each refreshing group of pictures. In the AVC described in the above Non Patent Literature 1, an intra-coded picture which limits subsequent pictures such that subsequent pictures do not refer to a picture encoded prior thereto, is called instantaneous decoding refresh (IDR) picture and a special picture identifier is assigned to the IDR picture in a bitstream. A video encoding device compliant with the AVC standard can correctly notify a video decoding device of the correct timing to start decoding for refresh by encoding a leading picture of the refreshing group of pictures as an IDR picture.
However, it is a problem that the instantaneous refresh increases transmission delay. As described above, generally, there is a high correlation in a video, in particular, temporally. Therefore, an intra-coded picture that cannot adopt inter-picture prediction requires a larger code amount than other pictures in order to maintain predetermined image quality. A larger difference in generated code amount between pictures increases required capacity of a buffer memory which is provided for the video encoding device and the video decoding device. The increase of capacity of the buffer memory causes an increase in transmission delay between the encoding device and the decoding device. Therefore, the instantaneous refresh is not appropriate for a use requiring high real-time property, such as equipment remote control through a video.
On the other hand, as a refresh method satisfying demands for a decrease in transmission delay, there is a method (referred to as gradual refresh) for gradually refreshing an screen by the partial area (referred to as segment) which is obtained by dividing the screen, and performing refresh across a plurality of pictures until refresh is completed. One of typical gradual refresh is intra-slice refresh, which is disclosed in, for example, Patent Literature 1. A slice is a set of coding unit blocks in a picture and refers to a segment that is independent of other coding unit blocks in the picture. An intra slice refers to a slice where intra-frame coding is selected for all coding units in the slice, and prediction using pixels in other slices including another slice of the same frame, is not performed. An intra-slice refresh performs such refresh in which a part of a slice in a picture is encoded as an intra slice and controls such that an area encoded as an intra slice in each of consecutive pictures is moved so that any area in the pictures is encoded as an intra slice at least once within predetermined time period.
FIG. 15 illustrates an example of operation of intra-slice refresh. A picture at the time t is indicated by P(t) below. In FIG. 15, P(ti−4), P(ti), and P(ti+4) are start frames of gradual refresh and the four pictures are refreshed until refresh is completed. A period (in FIG. 15, corresponding to 4 pictures) from the start of refresh and to the end thereof is called a refreshing period. In FIG. 15, partial areas indicated by black paint show intra slices. During a refreshing period, refresh of an image is performed by setting each of all areas in a screen to an intra slice at least once.
In the intra-slice refresh illustrated in FIG. 15, in order to refresh certainly, limitation is applied to a referable range for prediction also outside the intra slice, in the same way as the instantaneous refresh illustrated in FIG. 15. In FIG. 15, an area indicated by a dashed border constitutes a group of partial areas, which is a unit for refresh control, and is referred to as a refreshing group of segments (RGOS). Like the refreshing group of pictures in the instantaneous refresh, each refreshing group of segments is limited not to refer to a different refreshing group of segments which starts from a picture prior to a first picture having an area belonging to the refreshing group of segments. A video decoding device which receives and decodes a bitstream encoded as described above can obtain a correct decoded image of pictures after the end position of the refreshing period without disturbance in a whole screen by starting decoding from a leading picture within a refreshing period. Hereinafter, a leading picture within a refreshing period is referred to as a synchronization starting picture. A first picture, from which a correct decoded image can be obtained without disturbance in a whole screen when encoding is performed from the synchronization starting picture, is referred to as a synchronization completed picture.
Incidentally, the gradual refresh does not necessarily need to use an intra slice and is generally realized by limiting a prediction reference relationship between refreshing groups of segments. FIG. 16 illustrates an example of a general gradual refresh. In order to refresh certainly, areas belonging to each refreshing group of segments indicated by a dashed border are limited not to refer to a different refreshing group of segments starting from a picture prior to a leading picture of the refreshing group of segments. Further, more generally, when it is ensured that a decoded image after completion of refresh is sufficiently similar to an image decoded from the beginning of a bitstream, an area belonging to each refreshing group of segments may refer to another refreshing group of segments starting from a frame prior to a leading frame of the refreshing group of segments.
In gradual refresh, an increase in a code amount due to refresh is distributed to a whole refreshing period. That is, unlike an intra-coded picture of the instantaneous refresh, there is no picture which causes an increase in a code amount in the whole screen. Therefore, a variation in a code amount between pictures is more reduced than that in the case of using the instantaneous refresh. As a result, the required capacity of the buffer memory decreases and, furthermore, transmission delay between the encoding device and the decoding device is reduced.
On the other hand, in the case of using the gradual refresh, there is no an explicit refresh start point unlike an IDR picture. Therefore, a video encoding device, which generates a bitstream by using gradual refresh, multiplexes information on a synchronization starting picture and a synchronization completed picture (refresh information) on a bitstream and transfers the bitstream to a decoding device so that a decoding device can start decoding from the synchronization starting picture and also restart image display from the synchronization completed picture.
In the AVC described in Non Patent Literature 1, as a means where an encoding device transfers refresh information to a decoding device, a data group called a recovery point supplemental enhancement information message (recovery point SEI message) is defined. An AVC-compliant encoding device transfers the recovery point SEI message to the decoding device by multiplexing the message on the bitstream. An AVC-compliant decoding device can start decoding from the synchronization starting picture and restart image display from the synchronization completed picture according to the recovery point SEI message. The recovery point SEI message includes information on the synchronization starting picture and the synchronization completed picture and corresponds with both of the instantaneous refresh and the gradual refresh.
A list on FIG. 17 shows syntax elements configuring the recovery point SEI message which is used to transfer refresh information from an AVC-compliant encoding device to an AVC-compliant decoding device.
recovery_frame_cnt is a parameter for notifying of a synchronization completed picture. That is, for example, P(ti−4), P(ti), and P(ti+4) illustrated in FIG. 14, and, for example, P(ti−1), P(ti−3), and P(ti+7) illustrated in FIG. 15 are notified. A video decoding device is notified of the paired synchronization starting picture by existence of the recovery point SEI message itself. That is, an AVC-compliant video decoding device starts decoding from the synchronization starting picture to which the recovery point SEI message is associated and continues decoding until the synchronization completed picture indicated by recovery_frame_cnt, thereby obtaining a decoded image without disturbance in the whole screen. Therefore, the video decoding device can select and display only images without disturbance in regard to a bitstream generated using the gradual refresh.
exact_match_flag is a parameter for notifying of whether a decoded image in the case of starting decoding from the synchronization starting picture is exactly matched with a decoded image in the case of receiving and decoding the bitstream from the beginning thereof, in the synchronization completed picture indicated by recovery_frame_cnt.
broken_link_flag is a parameter for notifying of whether there is a possibility that disturbance will occur to the visibly unacceptable extent in a group of pictures existing until the synchronization completed picture indicated by recovery_frame_cnt, when decoding is started from the synchronization starting picture.
changing_slice_group_idc is a parameter for notifying of whether there is a partial area that does not affect completion of refresh even if a decoding process is omitted, in a group of pictures existing until the synchronization completed picture indicated by recovery_frame_cnt, when decoding is started from the synchronization starting picture.
The description related to an example of multiplexing of conventional refresh information with reference to the list of FIG. 17 is ended.