Digital Video Broadcasting (DVB) term refers to a number of standards defining digital broadcasting techniques that utilize satellite (DVB-S), cable (DVB-C), or terrestrial (DVB-T) distribution media. Such standards cover source coding, channel coding, conditional access (PayTV and related data scrambling solutions), and various other issues. In the early 1990's a specific DVB Project was established by major European public and private sector organizations in the television sector to create a framework for the introduction of MPEG-2 (Moving Picture Experts Group) audio/video compression standard into digital television services. The DVB project has steadily raised its popularity and worldwide adoption thereof is already on hand.
For satellite connections the DVB standard [1] defines transmission system as depicted in FIG. 1. It adapts intra-service 108 and inter-service 112 multiplexed base band (including video 102, audio 104, and possibly data 106) signals to a satellite channel during a number of processing steps collectively named herein as a satellite channel adapter, see dotted line with reference sign 110. The source coding has been generally applied to said signals in accordance with reference [2].
The following processes are applied to the data stream:                transport multiplex adaptation and randomization for energy dispersal 114,        outer coding (i.e. Reed-Solomon block codes) 116,        convolutional interleaving 118,        inner coding (i.e. punctured convolutional code) 120,        baseband shaping for modulation 122, and        modulation 124.        
Further details about DVB-S transmission can be found in reference [1] and cited publications therein.
Respectively, considering cable transmission of digital video signals, document [3] describes DVB-C components and features thereof. FIG. 2 discloses main functional blocks of sending direction in a cable system. BB interface block 202 adapts the input signal to the MPEG-2 transport layer framing structure (fixed length packets) with sync bytes. During sync inversion and randomisation 204 so-called Sync 1 byte is inverted and the data stream is randomised to ensure a sufficient number of transitions to occur in the signal for easier synchronization etc. Thereafter the randomised transport packets are subjected to Reed-Solomon FEC (Forward-Error Correction) coding 206 to retrieve a codeword for error detection and correction. The error-protected transport packets are then interleaved with a convolutional interleaver 208, meanwhile the actual convolutional coding as in DVB-S is not utilized at all. In step 210 the interleaved bytes are transformed into QAM (Quadrature Amplitude Modulation) symbols (m-tuple) after which differential coding 212 is applied to a number of most significants bits (MSB) in each symbol. Baseband shaping 214 includes mapping of m-tuples to I and Q signals followed by square-root raised cosine type filtering. Final stage, QAM modulation 216 of the signal has 16, 32, 64, 128, or 256 points in the constellation diagram. The modulated signal is then emitted to the physical interface being a radio frequency cable channel in this case.
As a third alternative, FIG. 3 discloses an overview of the DVB-T system parts. Within MUX adaptation/Energy dispersal 302 block the signal is organized in packets (1 sync byte, 187 MPEG-2 data bytes) and randomised for energy dispersal. Next, outer coding block 304 includes Reed-Solomon coding of the input packets for error protection. Then, outer interleaving 306 is introduced to the error-protected packets. The interleaved data is then directed to a convolutional coder, inner coder 308 with several possible puncturing rates. Inner interleaving 310 phase includes both bit-wise and (OFDM) symbol-wise interleaving stages for input of one or two, see dotted arrow, bit streams. For further information refer to “hierarchical mode” in publication [4]. During mapping 312 the data stream is mapped to the constellation space. When frame adaptation 314 takes place, the signal is organized in frames of 68 OFDM symbols. In addition to data, the OFDM frames include pilot and TPS 320 (Transmission Parameter Signalling) signals for frame synchronization, channel estimation etc. Finally the signal is OFDM modulated 316 (with a plurality of carriers) and D/A converted to analogue form after which the analogue signal is driven out to the air interface through front end 318.
Due to the tremendous success encountered by the Internet during the 1990's an additional model for providing DVB services in this case over IP (Internet Protocol) networks has been recently created, see specification [5]. It obviously was a tempting idea to utilize already existing data networks for transferring also DVB data without further need to invest in new hardware etc. DVB services over IP have been described with reference to a common type layer model disclosed in FIG. 4. Dotted lines represent interfaces between different domains (horizontal separation) and layers (vertical separation). Darkened background element, management plane, can be used for general management and control purposes. Content provider is an entity or a number of entities providing the clients (subscribers) with the information flow, notice the elliptical patters for visualizing the flow, to be actually physically transferred by a service provider over a delivery network being transparent to the IP traffic. Tasks of the content provider may include, for example, authentication/authorization services, service portals maintenance, service offering, service discovery mechanisms, metadata services, actual content services etc. Respectively, service provider (e.g. ISP service provider) tasks may include addressing services, authentication/authorization services, naming services (DNS etc), basic IP connectivity service, session control means, service accounting, and a number of various value added services like firewalls, caches etc. It's completely possible though that the content and service aspects are in practise offered and technically implemented by a single entity only. Home domain is the domain where the DVB services are consumed. It may refer to one or more terminal devices in a single network or, alternatively, to a number of networks including a number of devices.
As to the different layers of FIG. 4, physical layer 408 includes the lowest-level interfacing means to transfer data between the ends of a communications link. It determines e.g. connector shapes and sizes, “bit” definitions and synchronization aspects in relation to, for example, voltage levels and different time durations or other physical magnitudes. Reference numeral 408 also refers to link layer that handles media access control functions like addressing, and optionally error control, flow control, and re-transmission of defectively received data packets. Network layer 406 handles routing, packet segmentation/re-assembly etc functions relating to the whole end-to-end connection in question. In the case of IP networking such routing means addition of necessary IP addresses to sent packets. In principle, network layer 406 does not have to be aware of lower level physical/link 408 layers. Transport layer to which is likewise referred by collective reference sign 406 herein performs end-to-end type flow and error control functions and multiplexes a plurality of different services utilizing just a single IP link, for example. Multiplexing can be implemented by a plurality of different port numbers etc. Considering especially IP networks popular choices for a transport layer protocol are UDP (User Datagram Protocol) and TCP (Transmission Control Protocol) the latter of which provides also error detection/control on top of mere multiplexing. Session layer 404 sets-up and releases connections for applications' use. Application layer 402 includes applications and API(s) for interfacing them. In DVB context application layer 402 is specifically named as MHP (Multimedia Home Platform). Within the home domain IP traffic for DVB services can be carried over by utilizing, for example, common Ethernet (e.g. 100BASE-T) [6] or IEEE 1394 [7] physical/network layer technologies.
The DVB data encapsulated in IP packets can be either multicast or unicast to the subscribers depending on the service. For example, IP multicast can be used for PayTV type transfer and IP unicast for video/audio on demand type service. To retrieve more information about DVB in the context of IP networking, one shall revert to reference [5] and cited publications.
One of the most crucial decisions made at a time relates to the selected source coding method. MPEG-2 is a powerful aggregate of video and audio coding methods that utilize a number of different compaction techniques with remarkably high compression ratios with one major downside; the used compression methods are lossy, i.e. some data is irrevocably lost during the encoding process. Without such sacrifice the achievable compaction ratios (now typically from 1:6 to 1:30 etc) would not be near as impressive, as being obvious though. MPEG-2 coding also requires a considerable amount of processing, which, however, is generally not a problem with modern high performance processors anymore.
FIG. 5 discloses a generic process of encoding audio/video signal 502 with an MPEG-2 compliant encoder 504 producing standardized MPEG-2 stream as output. Audio/video server 506 receives and stores the encoded data stream, and eventually transmits it over transmission network 508 to receiver 510, e.g. a DVB set-top box connected to a television or a DVB IRD (Integrated Receiver Decoder) card installed therein, comprising necessary software/hardware means for decoding the stream for exploitation.
MPEG type coding shares some parts with a common still picture compression format JPEG that utilizes characteristics of human vision and extracts normally invisible and in that sense unnecessary information from a source picture during the encoding process. Encoding stage exploits e.g. Discrete Cosine Transform (time->frequency transformation) and entropy coding. High frequency changes in picture colour can be more easily omitted from the coded signal than high frequency luminance (brightness) changes to which the human eye is more sensitive. In addition to intra-frame (˜intra-figure) aspects, MPEG exploits also temporal redundancy, i.e. static portions in consecutive video frames do not have to be coded for every frame; eventually, a content change within a certain area triggers sending of coded version thereof.
In MPEG, each pixel in a figure is parameterised with luminance/brightness value (Y) and two color vectors (U, V). Pixels are then grouped together to form blocks and groups of blocks called macro-blocks. Blocks shall be converted into frequency domain by utilizing DCT that is rather similar to a common Fourier transform. DCT results a number of coefficients describing the cosine functions formed from the block with increasing frequency. From such coefficients the spatial information carried by the blocks can be later resolved by the decoding unit. DCT transform output is then effectively quantized and Huffman coded. In Huffman encoding different symbols consume a variable number of bits. Frequently used symbols consume fewer bits and less frequently used symbols more bits.
Considering next some temporal aspects of MPEG coding, it's clear that in a video signal comprising a sequence of pictures referred to as frames hereinafter data contained in certain blocks may remain relatively unaltered for at least short period of time still extending to the duration of a plurality of subsequent frames. That certainly depends on the source signal characteristics; for example, news broadcast may include a clip wherein a newsreader sits with a desk and tells about what has been going on lately with the national economy. It's probable that the subsequent frames include changes between them mostly in the blocks near the narrator's facial area, meanwhile the background comprising a wall with paintings/posters etc stays unchanged; probably also camera movements are minimal in this kind of informative program. On the contrary, a fight scene in a modern action movie hardly contains any fixed portions between a larger number of subsequent frames to say at least.
Therefore, some blocks can be occasionally predicted on the basis of blocks in previous frames. Frames that contain these predicted blocks are called P-frames. However, to reduce the detrimental effect of transmission errors and to allow (re)-synchronization to the coded signal, also complete frames that do not rely on information from other frames are periodically transmitted (few times a second). These in many ways crucial stand-alone frames are named intra-coded or I-frames. I-frames are likewise needed, when a service subscriber starts receiving the service stream for a first time or at least after a pause, and the receiver thus lacks the necessary data history for constructing valid decoded frames on the basis of mere differential data, for example. Bi-directional frames utilizing information both from prior and following frames are called B-frames.
The above process is taken further by encoding motion vectors such that only portions of a picture that move or can be borrowed from other locations in previous frames of the video are encoded using fewer bits. Four 8×8 pixel blocks are grouped together into 16×16 macroblocks. Macroblocks that do not change are not re-encoded in subsequent frames. With P-frames, the encoder searches the previous frame (or frames before and after in case of B-frames) in half-pixel increments for other macroblock locations that are a close match to the information contained in the current macroblock. If no adequately matching macroblocks are found in the neighboring region, the macroblock is intra-coded and the DCT coefficients are fully encoded. If an adequate match is found in the search region, the full coefficients are not transmitted, but a motion vector is instead used to point to the similar block(s).
Spatial and temporal sides of MPEG coding are depicted in FIG. 6 wherein two totally imaginary subsequent frames are coded in a computer equipped with an MPEG video encoder. Macroblocks 602 and 606 corresponding to a same location in the signal source contain practically identical data in both the frames and encoding thereof may thus be omitted (in P frames). Macroblocks 604 and 608, to the opposite, have a changing element, caused by a man walking by in otherwise static meeting scenario, and have to be re-encoded. However, as the encoding-causing element, the walking man, merely moves in the figure and its shape stays intact, motion vectors can be cleverly utilized in order to determine in the rightmost more recent frame a reasonable match with a macroblock in the previous frame on the left, see the dotted arrow highlighting this.
Respectively, MPEG audio coding utilizes certain distinct properties of human hearing like auditory masking effect. Both temporal and spatial (in frequency plane) aspects are considered with impressive 1:10 compression ratios achievable again with only minor, if any, degradations perceptible in the decoded signal. MPEG-2 has five channels for directional audio and a special low-frequency channel. Moreover, the encoded signal may also encompass a plurality of alternative language channels.
As the mammoth MPEG-2 standard includes a somewhat large number of different video and audio modes, the preferred level of adoption especially in case of DVB services is determined in reference [8] to facilitate the hardware manufacturers' tasks as to the compability issues inevitably rising in otherwise a bit too diverse context.
To provide the subscribers of DVB services with an option to really affect the service delivery (service subscription/selection, service parameters adjustment), a return channel for carrying out such tasks must be established. In DVB the interaction specifications have generally been split into two sets. One is network-independent and can be regarded as a protocol stack extending approximately via ISO/OSI layers two to three (see [9]) whereas the second group of DVB specifications relates to the lower layers (approximately one to two) of the ISO/OSI model and therefore specifies the network-dependent tools for interactivity. For example, the DVB Return Channel through Cable specification (DVB-RCC), see reference [10], is available for the purpose as well as the other specifications for fixed/cellular telephone interactivity and even satellite interactive systems. In case of IP networks, standard IP unicast can be used for interaction with a service/content provider. DVB Project web site http://www.dvb.org/ can be visited to find listings about available DVB related documentation.
However, notwithstanding the various existing data transfer arrangements for delivering DVB service or control data, situations may still occur in which the currently available resources do not suffice for achieving acceptable transfer times. For example, services like real-time games require short response times for providing the subscriber with a reasonable gaming experience. A gaming scenario is depicted in FIG. 7 where service provider is game server 702 transmitting game information to one or more subscribers via a DSL or cable network 704 acting as a delivery network. At receiver side, set-top box 708 receives and decodes the service data at decoder 709 being, for example, a dedicated video processing chip or a more general processing device with multiple different tasks allocated thereto, and forwards it to TV receiver or monitor 706 for visualization. Remote control 710 may be used for controlling the local devices or sending service related control instructions/requests/feedback (notice the arrows) to server 702 over the aforesaid delivery network 704 or some other optional transfer path available for such purpose. The overall delay sensed by the subscriber while utilizing the service consists of a plurality of issues, few obvious ones of which to be issued being source data encoding time, transmission delay (may be asymmetric in relation to transfer directions depending on the used connection type), decoding delay, additional safety buffer delays, etc. As mentioned here inbefore, coded MPEG-2 stream typically includes a number of different frame types and the ones (1-frames) without differential nature naturally being larger in size than the predictive counterparts (P-frames).
Typically decoder chips 709 in set-top boxes configured to decode the received, still encoded video signal have been designed to work with source data stream of somewhat stable input rate. Accordingly, to guarantee reasonably flat input rate of source video stream for decoding in the presence of transmission errors and variable transmission delays, the received data is buffered before being forwarded to the actual decoder 709.
In most broadcast/multicast type services modest buffering is acceptable in contrast to certain interactive services like games, wherein any additional delay introduced to service data or feedback provision shall be deemed as disadvantageous irrespective of the begetter thereof. Response times should be nearly always minimized to offer the service user maximally transparent use experience regardless of the applied transmission technique. Thereby, reception side FIFO (First In First Out) buffers comprising e.g. one or more subsequent I or P frames in the case of MPEG-2 service data are on the other hand plainly problematic but then again necessary components in contemporary systems equipped with standard video decoding means. Besides, video decoder chips may not be able to handle the buffer underflow situations in which input data is not timely available, causing perceptible errors in the visualized decoded picture and delaying the proper decoding of subsequent frames until the next I-frame is received. Some decoder chips may not even be able to survive the data loss and continue normal functioning thereafter.