High Efficiency Video Coding (HEVC) is a new video coding standard currently being developed in the Joint Collaborative Team-Video Coding (JCT-VC). JCT-VC is a collaborative project between the Moving Picture Experts Group (MPEG) and the Telecommunication Standardization Sector of the International Telecommunication Union (ITU-T). Currently, a committee draft is defined that includes a number of new tools which makes HEVC considerably more efficient than prior art video coding standards, in particular H.264/AVC.
HEVC is a hybrid codec that uses multiple reference pictures for inter-prediction. It includes a picture marking process in which reference pictures can be marked as “used for short-term reference”, “used for long-term reference”, or “unused for reference”. If marked “unused for reference”, the reference picture is turned into a non-reference picture and cannot be used for inter-prediction any more. A picture marked “unused for reference” cannot be re-marked later to be used for short-term or long-term reference.
The marking process in HEVC is controlled by RPSs. An RPS is a set of picture identifiers which identify reference pictures. The RPS is sent in each slice, and reference pictures which are indicated in the RPS will be kept in the Decoder Picture Buffer (DPB) and marked as “used for short-term reference” or “used for long-term reference”.
As an example, the RPS information may contain the values “−4, −6, 4”. This means that the current picture can predict from, i.e., copy pixels from, the picture four frames back (in display order) since the figure −4 is present. It will also be able to predict from the picture six frames back and even from the picture four frames in the future. Thus, the decoder can discard all the images in its buffer except for the three described above. This is a robust way for the decoder to discard pictures. In practice, the decoder may have to keep these images until they are displayed, but they will not be used for inter-prediction again. For simplicity, in the remainder of this disclosure this situation will be considered as if the decoder can discard these images.
Sometimes the information comprised in an RPS can be rather lengthy. As an example, the following RPS is used for test purposes during HEVC standardization: “−3, −2, 1, 2, 5, 6”. Encoding this example RPS may require up to 33 bits, which is a noticeable amount, in particular at very low bit rates and small image sizes.
One key observation is that the RPSs are typically not completely random. Rather, they can be reused over and over again. As an example, we consider a sequence of 18 images from a configuration file used for test purposes in HEVC standardization.
Pictures in HEVC are identified by their Picture Order Count (POC) values (PicOrderCntVal), also known as full POC values. These numbers represent the output order, also referred to as the display order, of the pictures. That is, a picture with POC=57 will be displayed directly after a picture with POC=56. However, the images are not always transmitted in the order they are displayed. For instance, the encoder may first transmit the picture with POC=0, followed by POC=8, followed by POC=4, and so forth. The decoder has to keep track of the pictures and display them in the correct order. In the example from HEVC standardization, the 18 pictures will be transmitted in the order indicated in FIG. 1.
As can be seen in FIG. 1, some RPSs are sent several times. For instance, the RPS sent for POC=6 (−2, −4, −6, 2) is the same as for POC=14. Therefore, the standard allows RPSs to be sent in Sequence Parameter Sets (SPSs) in addition to sending RPSs in slice headers. SPSs comprise data which does not need to be sent for every slice. Typically, SPSs are sent only once per sequence, or as often as the possibility of random access is desired. For instance, if the bit stream is broadcasted, it may be sufficient to send the SPS every second, since this would make it possible to switch channel every second. It should be noted that, for it to be possible to switch channel every second, a Clean Random Access (CRA) picture, or an Instantaneous Decoding Refresh (IDR) picture, also needs to be sent every second. A CRA picture is a picture that is not predicted from any previous picture, and no picture which follows the CRA picture in output order predicts from any picture that precedes the CRA picture in output order. An IDR picture is a CRA for which no picture which follows the IDR in decoding order may refer to any picture that precedes the IDR in decoding order.
In the SPS it is possible to specify the eight recurring RPSs of the example shown in FIG. 1 and assign indices to them, as is shown in FIG. 2. The information sent in the slice header now only has to refer to an RPS index in the SPS, as is illustrated in FIG. 3. Using RPS indices requires fewer bits than sending the RPSs themselves.
To this end, using RPS indices, as is illustrated in FIG. 3, considerably reduces the amount of bits required for sending RPSs since the bulk of the data is sent in the SPS instead, which is sent less frequently. Still, it turns out that it is possible to compress the RPS information even further. By comparing two rows in FIG. 1 one can notice a similarity between them. For instance, every number in the RPS for POC=1 is equal to a corresponding number in the RPS for POC=6 if “5” is added to it. That is the first value “−1” in the RPS for POC=1 is equal to −6+5. The second value “1” is equal to −4+5. The third value “3” is equal to −2+5. The only exception to this rule is the second to last number “5” in the RPS for POC=1. It would need a “0” in the RPS for POC=6, but an image cannot predict from itself.
As it turns out, every RPS in FIG. 1 can be predicted from another RPS. This leads to the following way of describing RPS data in an SPS, e.g., the RPS data in FIG. 1:                For each row, i.e., RPS, it is specified from which other RPS inter-prediction should be made. For instance, predicting from the preceding RPS is indicated by sending the value “−1” in delta_idx_minus1, which is a parameter in the SPS (see short-term RPS syntax FIG. 4).        Then, the value to add, “5” in the example, is transmitted using the values delta_rps_sign and abs_delta_rps_minus1, which are parameters in the SPS (see short-term RPS syntax FIG. 4).        
By sending RPS information in this way a lot of bits can be saved. The amount of saved bits is about 50% of the bits used for sending RPSs in the SPS, measured for an older version of the configuration files used for testing HEVC reference picture structures. Since SPS data is a very small part of the total video bit stream data, the overall effect is less than 50%, but compressing data efficiently is still important.
Typically, an RPS is sent once, in an SPS, and subsequent slices simply indicate which RPS should be used, by using an RPS index. In some situations, however, the encoder may want to use an RPS which is not in the SPS. The encoder has then the option of sending the RPS explicitly, i.e., in a slice, as is described above. Whether inter-prediction is used or the RPS is encoded value-by-value is signaled for each RPS using the parameter inter_ref_pic_set_prediction_flag, which is part of the short-term RPS shown in FIG. 4. If inter_refpic_set_prediction_flag is equal to zero, then the value-by-value method of transmitting RPSs is used, otherwise inter-prediction is used for signaling RPSs.