Scalable Video Coding (SVC), which is a scalable coding technique of H.264, is a new scalable coding technique that is developed to solve the problems of low compression efficiency, unsupportability of combined scalability, and high implementation complexity, which are caused by layered coding-based scalability attempted in existing Moving Picture Experts Group 2 (MPEG-2), MPEG-4, etc.
SVC encodes multiple video layers into a single bit sequence. The layers of SVC include one base layer and scalable layers that can be continuously stacked over the base layer.
Each scalable layer is able to express the maximum bit rate, frame rate and resolution that are given to itself based on low-order layer information.
The more the SVC continuously stacks scalable layers, the more diverse bit rates, frame rates, and resolutions it is possible to support. Thus, the SVC is a coding technique suitable for multimedia contents service in a Universal Multimedia Access (UMA) environment that can solve the problem of variability in bandwidth that occurs in a heterogeneous network environment, the problem of variability in receiving terminal performance and resolution, the problem of various preferences of contents consumers and so on in a complex way.
A Video Coding Layer (VCL) of an SVC encoder generates base layer encoding information and scalability encoding information of the scalable layers in slices.
Each slice is generated in Network Abstraction Layer (NAL) units in an NAL and stored in an SVC bitstream.
Although an RTP payload format for loading the NAL units of the SVC is currently disclosed in an internet draft document “draft-wenger-avt-rtp-svc-02.txt”, the SVC is of a complicated structure that stores encoding information of SNR scalability and temporal and spatial scalability, as well as base layer encoding information that is compatible with H.264, in a single bit stream. Thus, no research has provided a result yet on an effective RTP packetizing method that can support the RTP payload format of the SVC.
As RTP packet types for the NAL units of the SVC, there are a total of seven types, including a Single NAL Unit (SNU), a Single-Time Aggregation Packet-A (STAP-A), STAP-B, Multi-Time Aggregation Packet 16 (MTAP16), MTAP24, Fragmentation Unit-A (FU-A), and FU-B.
The SNU type can load only one NAL unit in one RTP, and the STAP can simultaneously load multiple NAL units that belong to the same presentation time instant in one RTP packet. This STAP is divided into an STAP-A type that loads NAL units in an RTP packet in the same order as decoding and a STAP-B type that loads NAL units in an RTP packet without considering the encoding order for interleaving purposes.
The MTAP can load multiple NAL units belonging to different presentation time instants in one RTP packet at a time and basically supports interleaving. This MTAP is divided into an MTAP16 type supporting a 16-bit time offset and an MTAP24 type supporting a 24-bit time offset depending on the size of a time offset field for displaying the difference in presentation time instant between the NAL units.
Among these seven RTP packet types, only packet types required according to an application field are aggregated by three types of RTP packet modes. FIG. 1 shows RTP packet types that can be supported by three types of RTP packet modes including an SNU mode, a non-interleaved mode, and interleaved mode.
The SNU mode of FIG. 1 is able to support only the SNU type that can load only one NAL unit having 1 to 23 “NAL_unit_types” shown in FIG. 2 in an RTP packet, and its application field is restrictive.
On the other hand, the non-interleaved mode is able to support the STAP-A and the FU-A as well as the SNU type, and thus, its practically applicable application range is wide.
The interleaved mode is a mode that adds an interleaving function to the non-interleaved mode, and has a drawback that it cannot support the SNU type. As the order of the NAL units to be loaded in the RTP packet by the interleaving function of the interleaved mode is different from the order of decoding, a burst error in a channel can be effectively dealt with, but RTP packetization and de-packetization and an SVC decoding procedure become very complicated.
Therefore, in view of the implementation complexity and the applicable application range, the non-interleaved mode is suitable as the RTP packetization mode that must be necessarily supported in a commercial SVC streaming service, and the interleaved mode can be considered as an option for a service in an environment with high channel error.
The SNU type of the non-interleaved mode is supposed to load one NAL unit having 1 to 23 “NAL_unit_types” shown in FIG. 2 in one RTP packet.
In other words, the STAP-A type of the non-interleaved mode has an RTP payload format structure as shown in FIG. 3, and is of the type that aggregates several NAL units corresponding to the same presentation time instant and loads the same in one RTP packet.
The STAP-A type of the non-interleaved mode, as shown in FIG. 3, has a 1-byte RTP payload header (STAP-A NAL HDR) additionally inserted therein, unlike the SNU type. The value of the F field of the payload header is set to “1” if there is more than one NAL unit in which the F field indicated in each of the headers of the NAL units to be loaded together has a value of “1”
The NRI field of the payload header is set to the maximum value of the NRI field values indicated in each of the headers of the NAL units to be loaded together.
In the “Type” field of the payload header, “NAL_unit_type” of No. 24 in FIG. 3 is set in order to show that this is a STAP-A type.
In addition, the “NALU_Size” field of 2 bytes representing the size of each NAL unit to be loaded separately from payload header information is inserted in the front part of each NAL unit.
The FU-A type of the non-interleaved mode is a type that divides a NAL unit into two or more so that it does not exceed an MTU (Maximum Transmission Unit) size and loads the divided units in respective corresponding RTP packets in order to prevent the occurrence of packet fragmentation in a router or gateway during transmission if the size of one NAL unit exceeds that of the MTU of a network.
FIG. 4 illustrates the structure of an RTP payload format for the FU-A type. The RTP payload header is composed of a total of 2 bytes including one byte of “FU_indicator” and one byte of “FU_header”.
The values indicated in the headers of the NAL units are applied to the F field and NRI field of “FU_indicator” as it is.
“NAL_unit_type” of “No. 28” in FIG. 1 is set in the “Type” field of “FU_indicator” in order to show that this is the FU-A type.
The S field and E field of “FU_header” are used in order to show that the parts to be divided and loaded are the start part of an NAL unit or the end part thereof, respectively.
In the “Type” field of the “FU_header”, the “NAL_unit_type” value indicating encoding contents contained in the NAL unit is set, as shown in FIG. 2.
That is, as described above, although the RTP packet type for the NAL units stored in an SVC bitstream is classified as standard, there has been no suggestion of the standard and method for determining a given NAL unit as a suitable packet type.
Consequently, the present invention proposes a practical RTP packetization algorithm which can effectively load NAL units of an SVC in an RTP payload while maintaining the specification of the RTP payload format.