This section is intended to provide a background or context to the invention that is recited in the claims. The description herein may include concepts that could be pursued, but are not necessarily ones that have been previously conceived or pursued. Therefore, unless otherwise indicated herein, what is described in this section is not prior art to the description and claims in this application and is not admitted to be prior art by inclusion in this section.
The Digital Video Broadcasting (DVB) Project is a European initiative to provide a common specification for delivering high bandwidth digital multimedia contents to set-top boxes and television set in a direct-to-home setting. This initiative has been adopted my several countries worldwide. The basic core standards are classified based on the physical transmission mechanisms they are specialized for. These standards are the Digital Video Broadcasting-Satellite (DVB-S); Digital Video Broadcasting-Cable (DVB-C); and Digital Video Broadcasting-Terrestrial (DVB-T).
DVB-T is also referred to as the “Common 2 k/8 k specification.” The multi-carrier modulation system used by DVB-T provides additional robustness in the presence of noise. It also enables the possibility to transmit in a large single frequency network (SFN), reusing frequencies within the network. Orthogonal Frequency Division Multiplexing (OFDM) is used by DVB-T in two modes: the 2K mode, which uses 1705 carriers, and the 8K mode, which uses 6817 carriers. The size of the SFN depends on the modes used: the 2K mode having a smaller SFN with a single transmitter than an 8K mode.
DVB-T mobile services have been launched in various locations. Using diversity antenna receivers, services which targeted fixed antenna reception can now also be received on the move. However, even though DVB-T has passed most suitability requirements for mobile applications, some concerns regarding data delivery for small handheld, battery-operated devices remained. Handheld mobile terminals require specific features from the transmission system serving them. These features include (1) extended receiver battery life; (2) improved radio frequency (RF) performance for mobile single antenna reception; (3) countering high levels of noise in a hostile transmission environment; and (4) efficient handovers. The Digital Video Broadcasting Handheld (DVB-H) standard has been developed. DVB-H uses the same basic concepts of DVB-T but adds additional features to improve mobility, power consumption and SFN usability.
DVB systems were originally designed to transmit digital multimedia contents to consumers directly to their homes. However, it was also recognized that the same transmission system is useful for broadcasting to consumers other types of data such as firmware updates for set-top boxes, games for set-top boxes, program guides, Internet services, and proprietary data such as stock market information. This broadcasting of data is referred to as datacasting. Depending on the different types of applications that can use datacasting and their requirements, six different profiles were defined. These profiles are: (1) data piping; (2) data streaming; (3) multi-protocol encapsulation (MPE); (4) data carousels; (5) object carousels; and (6) other protocols. For addressable data, such as data using Internet Protocol (IP) for transmission, the MPE profile is the most appropriate profile. DVB-H is designed to be IP-based, and it therefore uses MPE as the datacasting profile.
MPE-Forward Error Correction (MPE-FEC) is an optional multiplexer-layer FEC code based on Reed-Solomon (RS) codes. MPE-FEC is included in the DVB-H specifications to counter high levels of transmission errors. In MPE-FEC, the RS parity data is packed into a special FEC section referred to as MPE-FEC so that an MPE-FEC-ignorant receiver can simply ignore these sections. The computation of MPE-FEC is performed in the link layer, over IP packets before encapsulation into MPE sections.
In the following, the values correspond to the current standard. An MPE-FEC frame is arranged as a matrix with 255 columns and a flexible number of rows. Currently, column heights of 256, 512, 768, 1024 bytes are supported. FIG. 1 shows the structure of an MPE-FEC frame. Each position in the matrix hosts an information byte. The first 191 columns are dedicated to Open Systems Interconnection (OSI) layer 3 datagrams, such as IP packets, and possible padding. This portion of the MPE-FEC frame is referred to as the application data table (ADT). The next 64 columns of the MPE-FEC frame are reserved for the RS parity information. This portion is referred to as the RS data table (RSDT).
The ADT can be completely or partially filled with datagrams. The remaining space, when the ADT is partially filled, is padded with zero bytes. Padding is also performed when there is no space left in the MPE-FEC frame to fill the next complete datagram. The RSDT is computed across each row of the ADT using RS (255, 191). It is not necessary to compute the entire 64 columns of parity bytes, and some of the right-most columns of the RS data table can be completely discarded. This procedure is referred to as puncturing. The padded and punctured columns are not sent over the transmission channel.
The strict constraint on power consumption was a significant shortcoming of DVB-T and hence made it unsuitable for handheld mobile terminals. Handheld mobile devices have a limited source of power. The power consumed in receiving, decoding and demodulating a standard full-bandwidth DVB-T signal would use up a substantial amount of battery life in a short period of time. Time slicing of the MPE-FEC frames was used to solve this problem. Time slicing is similar to time division multiplexing (TDM). In TDM, multiple data streams are sent over the same channel by assigning each data stream unique slots in time. An advantage of TDM is its flexibility by allowing dynamic variations in the number of signals sent in the channel and the ability to constantly adjust time intervals to make optimal usage of the channel bandwidth.
When time-slicing is used, the data of a time-sliced service is sent into the channel as bursts so that the receiver, using the control signals, remains inactive when no bursts are to be received. This reduces the power consumption in the receiver terminal. The bursts are sent at a significantly higher bit rate, and the inter-time-slice period, also referred to as the off-time, is usually proportional to the average bitrate of the service(s) conveyed in the bursts. FIG. 2(a) shows the time-slicing of bursts with the various parameters that characterize it. Each burst typically consists of two parts, the ADT and RSDT. Consequently, the burst time consists of the burst time for ADT and the burst-time for RSDT. Analogously, after the transmission of the ADT of a burst, no application data of a certain program is transmitted for a time duration, referred to herein as the effective off-time.
A method referred to as the “delta-t method” is used to indicate the time interval that a receiver can switch off before it can switch back on to receive the next time slice of the service. The delta-t method is used to signal the time from the start of the currently-received MPE (or MPE-FEC) section to the start of the next burst. Delta-t times are indicated in every MPE section header, as illustrated in FIG. 2(b), so that the loss of an MPE section or multiple sections does not affect the capability of the receiver to accurately switch on at the beginning of the next time sliced burst. When time-slicing is in use, the time-slice start and stop times are computed using the delta-t and the maximum_burst_duration fields in the headers of the time-sliced MPE sections. A time-sliced burst cannot start before the delta-t time which is signaled by the MPE section headers of the previous time-sliced burst and cannot end later than the time indicated by delta-t+maximum_burst_duration. The maximum allowed jitter as specified for example in the standard ETSI EN 301 192 V1.4.1 (2004-11) Digital Video Broadcasting (DVB); DVB specification for data broadcasting can also be taken into account.
When a burst of data is received by a DVB-H capable receiver, the data is buffered to be processed and presented during the off period between bursts. The burst size Sb, defined as the amount of network layer bits received in a burst-duration, has to be less than the buffer available at the receiver for the particular service. The maximum burst duration tb is also signaled for every time-sliced elementary stream so that, under poor reception conditions, the receiver can infer when the burst has ended.
The layer 3 datagrams are always carried in MPE sections regardless of whether MPE-FEC is used, thus enabling it to be fully backward compatible to MPE-FEC ignorant receivers. The last section in an ADT table contains a table_boundary flag that signals the end of layer 3 datagrams within the ADT. In a time-sliced scenario, an MPE-FEC-aware receiver, upon encountering a table_boundary_flag, checks if all ADT sections are received correctly, for example using a Cyclic Redundancy Check (CRC), and discards all remaining sections in the burst if all ADT sections are received correctly. If some of the ADT sections contain errors, then the RSDT sections are received and are used to attempt to correct the errors. An MPE-FEC-ignorant receiver simply ignores the MPE-FEC (the RSDT part of an MPE-FEC matrix) and switches off the receiver until the next burst.
Aural and visual information are important components of most multimedia services and applications operating over transmission systems. In order to transmit aural and visual information in the current generation of popular networks, compression arrangements have been standardized. Most of these compression arrangements use known human perceptual qualities along with efficient binary data coding schemes to reduce redundant information and compress the input information. Both audio and video compression arrangements process continuous blocks of uncompressed samples to use the psycho-acoustic and psycho-visual information for redundancy reduction.
In point-to-multipoint (PTM)-type communications, simulcasting is often used to deliver data to receivers with heterogeneous capability. In a PTM communication scenario when a sender is sending a single media stream to multiple receivers with heterogeneous capability, a fair distribution system should deliver the media to the receiver commensurate with the capabilities of the individual receivers. In practice, however, this is a difficult proposition to achieve. The “fairness” issue arises from the fact that PTM transmission involves a trade-off between bandwidth efficiency and granularity of control over the quality of reception to an individual receiver. In a single-rate PTM transmission, the media transmission rate is chosen to match the lowest receiver capacity in a particular session. This solution is sub-optimal both in terms of bandwidth usage and receiver heterogeneity. The simulcasting approach is used to address this issue of fair distribution, using the transmission of several streams of identical source media at varying transmission and presentation characteristics. For example, two streams of different picture sizes can be transmitted.
The use of time-slicing in DVB-H indicates that data of a program is sent to the receiver in high-bit-rate bursts at specific time intervals. When a receiver tunes into a program, it either tunes into the channel during the time interval when the time-sliced program data is being transmitted or during the off-time.
Two different possibilities are possible when a receiver tunes in. The first possibility is that the receiver tunes in during the ADT transmission of the time-sliced burst of the required program. A special case of tuning in during a burst is that that receiver tunes in just at the beginning of the time-sliced burst of the required program. The second possibility is that the receiver tunes in between consecutive time-sliced bursts of the required program. When the receiver tunes into a channel in the beginning or middle of the ADT transmission of a time-sliced burst, it can start data reception without any delay. However, when the receiver tunes into the channel after the ADT transmission of the time-sliced burst for the program has ended, it has to wait for an additional period of time before the next time-sliced burst for the program is transmitted. This delay can be anything from zero (exclusive) to the effective off-time period. FIGS. 3(a)-3(c) show the three different scenarios that can occur when a receiver tunes into a service transmitted in a time-sliced DVB-H channel. In FIG. 3(a) the tuning in occurs at the beginning of a burst n. In FIG. 3(b), the tuning in occurs in the middle of burst n. In FIG. 3(c), the tuning in occurs in between bursts n and n+1.
To estimate the probability that a receiver tunes into a time-sliced burst of a service, it is helpful to assume that the service bit rate is bs and the total DVB-H channel bandwidth for all services transmitted through it is bc. If event Eb is defined as the event when a receiver tunes into the time-slice burst during its transmission, then P(Eb) is defined as the probability that this event occurs. This probability is given byP(Eb)=bs/bc  (1)
In equation (1), it is assumed that the service is using the full capacity of the channel. It is also possible that a service does not use the full capacity of the channel. For example, a time-sliced set of DVB-H services can be multiplexed with continuous DVB-T services into a single MPEG-2 transport stream. In such a parallel service case, bc is defined to be the total bandwidth available for the set of DVB-H services. The probability P(Ei) that the receiver tunes into an off-time of the service time-sliced burst is then given byP(Ei)=(bc−bs)/bc  (2)P(Ei)=1−P(Eb)  (3)
Equations (1), (2) and (3) reveal that when bs is much smaller than bc, there is a very high probability that the receiver tunes into the service during the off-time of the service. This indicates that there is a high probability that the receiver has to wait for information when it tunes into a channel to receive a service.
Program P is a streamed audio-visual presentation. The audio and the video components are coded separately, multiplexed together, and time-sliced for carriage over the DVB-H radio network. A burst of P contains audio-visual data in an interval [τs, τe]. The time period during which data of P is transmitted is referred to as the burst-time tb. The burst-time consists of two parts, burst-times for ADT (tbADT) and RSDT (tbRSDT). After the time interval tb, no data of program P is transmitted for a time duration of Δt, referred to as the off-time. Analogously, after the transmission of the ADT of a burst, no application data of program P is transmitted for time duration of Δte, referred to as the effective off-time The cycle-time δ is defined as δ=tb+Δt=tbADT+Δte, i.e., the time difference between the start time of consecutive time-sliced bursts. The tune-in initiation time τt is defined as that instant on the transmission curve time-line when the user decides to consume P and initiates action to receive data from the channel. The tune-in delay Δ(T-IN) is defined as the amount of time elapsed after τt to the moment when the rendering of P starts. This is also referred to as channel zapping delay, channel-switch delay, and start-up delay. Δ(T-IN) can be considered as a cumulative sum of the following component delays:
A1. Time-slice synchronization delay Δ(T-SYNC).
A2. Delay to compensate potentially incomplete reception of the first time-sliced burst Δ(COMP).
B. Reception duration of the first time-sliced burst Δ(RCPT).
C. Delay to compensate the size variation of FEC Δ(FEC).
D. Delay to compensate for the synchronization time between associated media streams (e.g. audio and video) Δ(M-SYNC).
E. Delay until media decoders are refreshed to produce correct output samples denoted by Δ(REFRSH).
F. Delay to compensate the varying bitrate of a media bitstream denoted by Δ(VBR-COMP).
G. Processing delays of the receiver and player implementations denoted by Δ(PROC).
Thus, Δ(T-IN) can be given as Δ(T-IN)=Δ(T-SYNC)+Δ(COMP)+Δ(RCPT)+Δ(FEC)+Δ(M-SYNC)+Δ(REFRSH)+Δ(VBR-COMP)+Δ(PROC) 
It should be noted that the above equation of Δ(T-IN) is a simplification, as the delay to acquire the required transport-level signaling, such as Program Specific Information/Service Information (PSI/SI) and Entitlement Control Messages (ECM) for conditional access (CA), are not considered. Furthermore, it is assumed that no application-layer content protection is used and hence related delays, e.g., for acquiring the content protection keys, are omitted from the discussion. Finally, the delay jitter of burst intervals (Delta-t Jitter) is not handled as well but is straightforward to use as a guard interval in the activation of the radio reception.
The delay Δ(REFRSH) is usually applicable to video only, whereas in audio, Δ(REFRSH) would typically be equal to zero. The values of other delay components are often identical for both audio and video. Δ(T-SYNC), Δ(COMP), and Δ(RCPT) are discussed in more details below.
As discussed above, there are two possibilities for the moment that the user initiated the switch of programs relative to the transmission of P. In the first possibility, tune-in occurs during a burst carrying P as illustrated in FIG. 3(b). In a special case, tune-in occurs exactly at the beginning of a burst carrying P (FIG. 3(a)). In the second possibility, tune-in occurs in between two consecutive bursts of P as illustrated in FIG. 3(c).
Before analysis of these scenarios, two delays are defined. The first, referred to as the time-slice synchronization delay Δ(T-SYNC), is defined as the time elapsed from the moment when the user initiates the desire to consume P to the moment when the receiver obtains data of P. The second, referred to as the incomplete data compensation delay Δ(COMP), is the delay incurred to compensate for the playback duration of data that was not received before tune-in initiation time τt in the burst. This delay is applicable only when tune-in occurs in the middle of the burst transmission.
When the receiver tunes in during the burst-time for ADT, the decoding and/or playback has to be delayed by an amount that is equivalent to the playback duration of those coded data units that occurred in the burst prior to the tune-in initiation time in order to guarantee playback without any pause. In the special case, when a receiver tunes into P exactly at the beginning of a burst, all data for decoding the burst becomes available and hence Δ(COMP)=0. It is noted that it may not be possible to apply FEC decoding for error correction of an incompletely received time-sliced burst, as the amount of data columns that were not received may outnumber the correction capability of the FEC code. To keep the following delay analysis and equations simple, it is assumed that data is transmitted in decoding order, audio and video frames are interleaved in ascending order of decoding times, the decoding order is identical to the output order and the sampling curve is linear. Given these assumptions, the delay to compensate the incomplete reception of the first time-sliced burst becomesΔ(COMP)=δ−(τe−τt).
Assuming a uniform random distribution of tune-in times during the first received burst, Δ(COMP) ranges from 0 to δ and the expected Δ(COMP) becomesE[Δ(COMP)]=δ/2
The probability of tuning during a burst of a desired program is given byP(Eb)=tbADT/δ
When the receiver tunes into the program during the effective off-time period, it has to wait until the next time-sliced burst for the desired program starts. This delay can be anything from zero to the off-time period Δt. If the time instant when receivers tune into the channel is assumed to be uniformly distributed, then the probability P(Eo) that a receiver tunes into an off-time is given byP(Eo)=te/δ
The expected Δ(T-SYNC) isE[Δ(T-SYNC)]=Δt/2
The reception duration of the time-sliced burst depends on the size of the first MPE-FEC frame containing the desired program, as well as the transmission bitrate for the MPE-FEC frame. DVB-H allows the service provider to select the size of the MPE-FEC frame in terms of the rows of the frame (256, 512, 768, or 1024), the number of application data columns in the frame, and the number of Reed-Solomon FEC columns in the frame. The transmission bitrate for the MPE-FEC frame depends on the bitrate of the MPEG-2 transport stream multiplex carrying the program which, in turn, depends largely on the modulation system used in the radio transmission. Furthermore, potential non-time-sliced services reduce the transmission bitrate of the time-sliced bursts accordingly.
It should be noted that if receivers started media decoding immediately when the first IP datagram of the program is received, i.e., during the reception of the MPE-FEC frame, a corrupted IP datagram might not be correctable by FEC decoding before its rendering time. Hence, receivers should buffer an entire MPE-FEC frame and apply FEC decoding, if necessary, before decoding of the media streams.
Advanced Video Coding (AVC), also know as H.264/AVC, is a video coding standard developed by the Joint Video Team (JVT) of ITU-T Video Coding Expert Group (VCEG) and ISO/IEC Motion Picture Expert Group (MPEG). AVC includes the concepts of a Video Coding Layer (VCL) and a Network Abstraction Layer (NAL). The VCL contains the signal processing functionality of the codec—mechanisms such as transform, quantization, motion-compensated prediction, and loop filters. A coded picture consists of one or more slices. The NAL encapsulates each slice generated by the VCL into one or more NAL units.
Scalable Video Coding (SVC) provides scalable video bitstreams. A scalable video bitstream contains a non-scalable base layer and one or more enhancement layers. An enhancement layer may enhance the temporal resolution (i.e. the frame rate), the spatial resolution, and/or the quality of the video content represented by the lower layer or part thereof. In the SVC extension of AVC, the VCL and NAL concepts were inherited.
Multi-view Video Coding (MVC) is another extension of AVC. An MVC encoder takes input video sequences (called different views) of the same scene captured from multiple cameras and outputs a single bitstream containing all the coded views. MVC also inherited the VCL and NAL concepts.
Many video coding schemes utilize inter prediction, which is also referred to as temporal prediction and motion compensation. Inter prediction removes redundancy between subsequent pictures. H.264/AVC, as other today's video compression standards, divides a picture to a mesh of rectangles for each of which a similar block in one of the decoded reference pictures is indicated. The location of the prediction block is coded as motion vector that indicates the position of the prediction block compared to the block being coded.
Decoded pictures used for predicting subsequent coded pictures and for future output are buffered in the decoded picture buffer (DPB). The DPB management processes, including the storage process of decoded pictures into the DPB, the marking process of reference pictures, and the output and removal processes of decoded pictures from the DPB, are specified to enable efficient utilization of the buffer memory.
The reference picture management process in H.264/AVC is summarized as follows. The maximum number of reference pictures used for inter prediction, referred to as M, is indicated in the active sequence parameter set. When a reference picture is decoded, it is marked as “used for reference.” If the decoding of the reference picture caused more than M pictures to be marked as “used for reference,” then at least one picture must be marked as “unused for reference.” The DPB removal process then removes pictures marked as “unused for reference” from the DPB if they are not needed for output as well. Each short-term picture is associated with a variable PicNum that is derived from the syntax element frame_num, and each long-term picture is associated with a variable LongTermPicNum that is derived form the long_term_frame_idx which is signaled by the memory management control operation (MMCO).
There are two types of operations for the reference picture marking: adaptive memory control and sliding window. The operation mode for reference picture marking is selected on picture basis. The adaptive memory control requires the presence of MMCO commands in the bitstream. The memory management control operations enable explicit signaling which pictures are marked as “unused for reference,” assigning long-term indices to short-term reference pictures, storage of the current picture as long-term picture, changing a short-term picture to the long-term picture, and assigning the maximum allowed long-term index for long-term pictures. If the sliding window operation mode is in use and there are M pictures marked as “used for reference,” the short-term reference picture that was first decoded picture among those short-term reference pictures that are marked as “used for reference” is marked as “unused for reference.” In other words, the sliding window operation mode results in first-in-first-out buffering operations among short-term reference pictures. When some highest temporal layers are discarded, gaps in frame_num are present in the bitstream. In this case, the decoding process generates short-term “non-existing” pictures having the missing frame_num values. Such “non-existing” pictures are handled in the same way as normal short-term reference pictures in the sliding window reference picture marking process.
The hypothetical reference decoder (HRD), specified in Annex C of the H.264/AVC standard, is used to check bitstream and decoder conformances. The HRD contains a coded picture buffer (CPB), an instantaneous decoding process, a decoded picture buffer (DPB), and an output picture cropping block. The CPB and the instantaneous decoding process are specified similarly to any other video coding standard, and the output picture cropping block simply crops those samples from the decoded picture that are outside the signaled output picture extents. The DPB was introduced in H.264/AVC in order to control the required memory resources for decoding of conformant bitstreams. There are two reasons to buffer decoded pictures—for references in inter prediction and for reordering decoded pictures into output order. The DPB includes a unified decoded picture buffering process for reference pictures and output reordering. A decoded picture is removed from the DPB when it is no longer used as reference and needed for output. The maximum size of the DPB that bitstreams are allowed to use is specified in the Level definitions (Annex A) of H.264/AVC.
There are two types of conformance for decoders—output timing conformance and output order conformance. For output timing conformance, a decoder must output pictures at identical times compared to the HRD. For output order conformance, only the correct order of the output picture is taken into account. The output order DPB is assumed to contain a maximum allowed number of frame buffers. A frame is removed from the DPB when it is no longer used as a reference and needed for output. When the DPB becomes full, the earliest frame in output order is output until at least one frame buffer becomes unoccupied.
In the H.264/AVC standard, the required DPB size for decoding a bitstream is specified by the syntax element max_dec_frame_buffering. The syntax element num_reorder_frames indicates the maximum number of frames that precede any frame in the coded video sequence in decoding order and follow it in output order. According to this value, the decoder can start to output pictures as early as possible thus to reduce the end-to-end delay without overflowing the DPB.