1. Field of the Invention
The present invention relates to an apparatus for transmitting encoded video stream and a method for transmitting the same.
2. Discussion of the Related Art
Playback of digital video is realized by displaying a sequence of still images or frames at a constant rate, which is in units of frames per second (fps). For smooth video playback, frames must satisfy strict playout deadlines. Therefore, all network and playback delays must remain under the time constraint permitted between successive frames.
The H.264 codec is a highly efficient coding standard. Like traditional video compression techniques, H.264 uses predictive methods to reconstruct video sequences. This concept is derived from the fact that digital video typically exhibits spatial redundancy, which denotes similarities between pixels within a frame, and temporal redundancy, which are similarities between pixels in adjacent frames. Frames are divided into units of macroblocks (MBs), which are 16×16-pixel regions, and each MB is either intra-coded or inter-coded. An intra-coded MB is reconstructed using information from the current frame. Inter-coded MBs are reconstructed using information only from the previous frame (predicted), or from both previous and future frames (bi-predicted).
Frames can contain a mixture of different MB types and are labeled according to the types of references made for prediction. For example, a B-frame holds MBs that are bi-predicted; however, it may also contain intra-predicted MBs. A P-frame will contain MBs that are predicted from past frames and may also contain intra-predicted MBs. I-frames contain only intra-predicted MBs and do not reference other frames.
An encoded video consists of a sequence of Group of Pictures (GOP), which is a set of coded pictures that specifies the order of I-, P-, and B-frames.
FIG. 1 shows the frame ordering of a GOP in related art. The arrows indicate sources of predicted MBs from adjacent frames.
As shown in FIG. 1, only I-frames and P-frames serve as reference frames. Due to this interdependency between frames, errors can propagate through frames in a GOP when packet losses occur.
An HD video frame encoded using the H.264 standard is typically subdivided into multiple slices. Slices are classified by the types of MBs they contain.
FIG. 2 shows the hierarchical syntax of the H.264 bitstream in related art.
As shown in FIG. 2, the syntax is abstracted in two layers: the Video Coding Layer (VCL) that holds the actual compressed video data and the Network Abstraction Layer (NAL) that encapsulates the compressed data and additional information in a form suitable for packet-based networks.
The Network Abstraction Layer consists of a series of NAL units, which are the minimum units of data decodable by H.264. Three common NAL units are Sequence Parameter Set (SPS), Picture Parameter Sets (PPSs), and slices.
The SPS contains parameters common to an entire video, such as the profile and level the coded video conforms to. Therefore, if a packet containing SPS information is lost, the entire video cannot be decoded. The PPS contains common parameters that are applied to a sequence of frames, such as the entropy coding mode. If the PPS for a sequence of frames is lost, then these frames cannot be decoded. As mentioned before, a slice is the main unit for constructing a frame, and a frame can have either a single slice or multiple slices.
Each slice contains a slice header followed by video data containing MBs. A slice header contains information common to all the MBs within a slice. If a slice header is lost, then the entire slice cannot be decoded even if the slice data is properly received.
FIG. 3A and FIG. 3B show the effect of packet loss on a frame in related art. FIG. 3A shows the original transmitted frame, while FIG. 3B shows the received frame with some information missing due to packet loss.
As shown in FIG. 3B, the slice header for Slice 4 is lost, rendering the entire slice undecodable. In contrast, the slice header for Slice 5 is received but the last two Real Time Protocol (RTP) packets are lost, which allows most of the slice to be decoded.