Video coding may be performed in intra mode and/or inter mode. Intra mode exploits redundancies within a video frame, and inter mode exploits redundancies between video frames. In inter mode, pixel luma/chroma predictions are obtained from already coded/decoded pictures called reference pictures. Depending on the number of reference pictures used for prediction, inter mode is categorized into uni-prediction mode (or uni-directional mode), bi-prediction mode (B mode), and possibly tri-prediction mode, etc., where, respectively, 1, 2 and 3 reference pictures are used. Within this document, these different modes, i.e. uni-prediction, bi-prediction, etc., will be referred to as “reference modes”.
Advanced Video Coding (AVC), which is also known as H.264 and MPEG-4 Part 10, is the state of the art standard for 2D video coding from ITU-T (International Telecommunication Union-Telecommunication Standardization Sector) and MPEG (Moving Picture Experts Group). The AVC codec is a hybrid codec, which takes advantage of eliminating redundancy between frames and within one frame.
In AVC, indicators of the relevant reference pictures are placed in order into two reference lists. The indicators are denoted reference indices, and are numbered from 0 to N, e.g. (0, 1, . . . , N). The first list, List 0 (L0), primarily manages the past reference pictures, i.e. reference pictures preceding a current picture in time, and the second list, List 1 (L1), typically manages the future reference pictures, i.e. reference pictures subsequent to a current picture in time. For low delay video coding, L1 can also manage past reference pictures. Each list can hold indices of up to 15 reference pictures, i.e. N=14).
Further, in AVC, an indicator, or reference mode index, specifying the selection of one of the reference picture lists (e.g. for uni-prediction), or both reference picture lists (e.g. for bi-prediction), is coded together with the partition structure in Macro Block (MB) mode/sub-MB mode, while the indicators, or reference picture indices, specifying the selected reference pictures in the respective lists are coded as separate syntax elements. “Partition structure” refers to partitions, such as e.g. 16×16, 16×8 or 8×16, of a 16×16 MB. A partition, e.g. 16×16, is typically associated with one motion vector (MV) and one reference index when uni-prediction is used, and with two MVs and two reference indices when bi-prediction is used. An MV has an horizontal component MVx and a vertical component MVy that describes how pixels of the current partition are produced from the corresponding reference picture, such as Ipred(x,y)=Iref(x-MVx,y-MVy).
The number of reference pictures associated with a picture or partition depends on the reference mode associated with the same partition, i.e. whether it is uni-prediction or bi-prediction, etc. When decoding the reference information in a decoder, both the reference mode index and the one or more reference picture indices associated with a picture or partition must be correctly decoded, in order for the decoder to be able to decode the picture or partition correctly. Incorrect decoding of either of the reference mode index and the one or more reference picture indices may result in erroneous interpretation of the reference information.
The current methods of coding reference information, such as the method of AVC described above, require a relatively large number of bits in order to convey the reference information associated with each block. This is identified as inefficient in terms of coding efficiency.