Digital encoding of various source signals has become increasingly important over the last decades as digital signal representation and communication increasingly has replaced analogue representation and communication. For example, mobile telephone systems, such as the Global System for Mobile communication, are based on digital speech encoding. Also distribution of media content, such as video and music, is increasingly based on digital content encoding.
Typically, an audiovisual content item comprises a number of different audiovisual components and types of data. For example, a content item corresponding to a movie or television program may include at least one video signal component, typically a plurality of different audio components, control data, synchronization data, meta-data e.g. characterizing the content etc. For example, a movie may include a main video component, a secondary video component, a plurality of audio tracks (e.g. for different languages), subtitle data, metadata identifying e.g. movie title, main actors etc. Thus, often a relatively large number of different data types need to be included into a single combined data stream for the audiovisual content item.
In order to accommodate a representation of an audiovisual content item which includes a range of different types data, an audiovisual content item data stream may often be generated that comprises a plurality of (sub) audiovisual data streams providing audiovisual components for the audiovisual content item. In addition, data streams may be included comprising control data, meta-data etc.
The audiovisual content item data stream can comprise all data related to rendering of the content item. The audiovisual content item data stream is typically referred to as a transport stream, or possibly as a system stream, program stream or container stream. The individual audiovisual data stream is typically referred to as an elementary data stream.
In order to provide an efficient representation of the audiovisual content item, it is important that an effective data structure is defined for the audiovisual content item data stream. The use of a data structure comprising a number of separate audiovisual data streams which each represent audiovisual components provide for a flexible yet efficient approach. The approach for example allows a flexible inclusion of different audio tracks for a given video component, e.g. audio signals corresponding to different languages may be provided in different audiovisual data streams.
A number of different structures for audiovisual content item data streams have been standardized. One of the most widespread and frequently used structures for audiovisual content item data streams is the MPEG-2 Transport Stream which is used for example for digital television broadcast or Blu-rays.
The MPEG-2 Transport Stream is an example of a data structure wherein the data stream is made up of a plurality of sequential time multiplexed data packets. Each data packet may provide data for a specific component of the audiovisual content item.
However, a problem with the conventional approach for audiovisual content item data streams is that the data structure is suboptimal for some purposes, and in particular tends to not provide optimal flexibility.
For example, audiovisual content item data streams such as the MPEG-2 Transport Stream do support alternative audio representations for a given scene by allowing for different audio representations to be provided in different elementary streams. A receiver may then select between these alternative elementary streams to provide a desired audio track. E.g., an MPEG-2 Transport Stream may comprise an elementary stream comprising a video component along with two elementary streams that each provides an audio representation that can be rendered together with the video component. For example, one elementary audio stream may comprise the audio of the video component in the original language while another elementary audio stream may comprise the audio for the video component but with the speech dubbed in a different language. A decoder or renderer may then select between the alternative audio tracks for the video by selecting the appropriate elementary stream.
However, such an approach does not provide optimum flexibility for the audio, and also results in a relatively high data rate due to the parallel audio representations provided by the alternative elementary streams.
It would accordingly be desirable to provide an improved approach for audiovisual content item data streams, and in particular an approach which provides additional flexibility and/or a reduced data rate would be desirable.
However, a critical challenge is that of how such an enhancement is possible while keeping a high degree of commonality with existing approaches. For example, it is desirable to be able to further enhance the MPEG-2 Transport Stream, but such enhancement should preferably maintain as much backwards compatibility as possible. The considerations required for enhancing approaches for audiovisual content item data streams may furthermore not just be limited to which additional data should be provided, how or in which format. Rather, additional challenges exist in determining how such data should be included in an audiovisual content item data stream to not only achieve an efficient audiovisual content item data stream but also allow for an efficient operation and preferably optimized backwards compatibility.
Hence, an improved approach for audiovisual content item data streams would be advantageous.