The invention relates to media storage, transmission, reception and playback, in particular to media storage in or playback from a file having a media data container and a metadata container, as e.g. a file based on the ISO (International Organization for Standardization) base media file format.
Various electronic devices are enabled to receive and present media data streams. Such media data streams can e.g. be received from a digital video broadcasting network that broadcasts media streams in accordance with e.g. the DVB-H Standard (Digital Video Broadcasting—Handhelds) or the DVB-T Standard (Digital Video Broadcasting—Terrestrial).
DVB-T uses a self-contained MPEG-2 (MPEG=Moving Pictures Expert Group) transport stream containing elementary MPEG-2 video and audio streams according to the international standard ISO/IEC 13818 (IEC=International Electrotechnical Commission). The MPEG-2 transport stream is a multiplex used in many of today's broadcast systems. It is a stream multiplex of one or more media programs, each containing typically audio and video but also other data. MPEG-2 transport streams share a common clock per program and use time-stamped media samples (Access Units, AUs) in all media streams within a program. This enables synchronization of sender and receiver clocks and lip synchronization of audio and video streams.
For DVB-H, elementary audio and video streams are encapsulated in RTP (Real-Time Transport Protocol), UDP (User Datagram Protocol), IP (Internet Protocol), and MPE (Multi-Protocol Encapsulation) for IP data casting. RTP is used for effective real-time delivery of multi-media data over IP networks. Multiplexing is typically done by associating different network ports to each distinct media stream, e.g. one network port for video and another one for audio.
A streaming service is defined as a set of synchronized media streams delivered in a time-constraint or unconstraint manner for immediate consumption during reception. Each streaming session may comprise audio, video and/or real-time media data like timed text. A user receiving media data for a movie by means of a mobile television, for instance, can watch the movie and/or record it to a file. Commonly, for this purpose the received data packets of the received media stream are de-packetized in order to store raw media data to the file. That is, received RTP packets or MPEG-2 packets are first de-packetized to obtain their payload in form of media data samples, such as compressed video or audio frames. Then, after de-packetizing, obtained media data samples are replayed or stored to the file. The obtained media samples are commonly compressed by formats like the H.264/AVC (AVC=Advanced Video Coding) video format and/or the MPEG-4 HE-AACv2 (HE-AACv2=High-Efficiency Advanced Audio Coding version 2) audio format. When media data samples having such video and/or audio formats are to be stored, they may be stored in a so-called 3GP file format, also known as 3GPP (3rd Generation Partnership Project) file format, or in an MP4 (MPEG-4) file format. Both 3GP and MP4 are derived from the ISO base media file format, which is specified in the ISO/IEC international standard 14496-12:2005 “Information technology-coding of audio-visual objects—part 12: ISO base media file format”. A file of this format comprises media data and metadata. For such a file to be operable, both of these data may be present. The media data is stored in a media data container (mdat) related to the file and the metadata is stored in a metadata container (moov) of the file. Conventionally, the media data container comprises actual media samples. I.e., it may comprise e.g. interleaved, time-ordered video and/or audio frames. Thereby, each media has its own metadata track (trak) in the metadata container moov that describes the media content properties. Additional containers (also called boxes) in the metadata container moov may comprise information about file properties, file content, etc.
Recently, so-called reception hint tracks for files based on the ISO base media file format have been defined by international standardization groups. Those reception hint tracks may be used to store multiplexed and/or packetized streams like e.g. a received MPEG-2 transport stream or RTP packets. Reception hint tracks may be used for a client side storage and playback of received data packets. Which shall also be denoted as data samples in the sequel of this specification. Thereby, received MPEG-2 TS or RTP packets of one stream are directly stored in reception hint tracks as e.g. pre-computed samples or constructors. I.e., in the case of reception hint tracks, the data packets are stored as samples in the media data container of the file based on the ISO base media file format. Playback from reception hint tracks may be done by emulating the normal stream reception and reading the stored data packets from the reception hint track as they were received over IP.
The ISO/IEC international standard 14496-12:2005 “Information technology-coding of audio-visual objects—part 12: ISO base media file format” defines a sample grouping as an assignment of each sample in a track to be a member of one sample group, based on a grouping criterion. As there may be more than one sample grouping for the samples in track, each sample grouping has a type field to indicate the type of grouping.
Sample groups are defined in two steps. First, a type of the grouping is defined in a sample group description box (sgpd). In a second step, this description is assigned to samples in a sample-to-group box (sbgp). The sample groups mechanism is extensible and is currently used for AVC- and SVC-specific extensions and proprietary extensions.
A non-exhaustive description of the syntax is given below:
abstract class SampleGroupDescriptionEntry {// proprietary data}
A simplified version of the SampleGroupDescriptionBox is given here. In the ISO file format specialized versions depending on the handler type exist.
aligned(8) class SampleGroupDescriptionBox extendsFullBox(”sgpd“) {unsigned int(32) grouping_type;unsigned int(32) entry_count;for(i=1; i<=entry_count; i++) {SampleGroupDescriptionEntry( );}}
In one instance of the box multiple groups can be defined and every sample may be member of one group. The syntax of the SampleToGroup box is provided.
aligned(8) class Sample-to-group box extends FullBox(”sbgp“){unsigned int(32) grouping_type;unsigned int(32) entry_count;for(i=1; i<=entry_count; i++) {unsigned int(32) sample_count;unsigned int(32) group_desc_index;}}
The following abstract example shall illustrate how sample groups work:
Let us assume that the “color” of each sample has to be described. For a complete set of samples all samples with the same color are grouped together.
First, it has to be specified which colors can occur. For each color, a “SampleGroupDescriptionEntry” is defined. A value for the grouping_type “color” is defined and all color description entries are stored in the SampleGroupDescriptionBox for the grouping_type color.
Second, the sample-to-group box for the “color” grouping_type describes which sample has which color. This is done in differential way: every list entry describes how many consecutive samples have the same color. This allows a very compact storage for a rare change of colors, e.g. first a high number of samples have color one, then a number of samples have color 2 and so on.
For three colors and a file of 50 samples the tables based on the above described syntax could look like this:
SampleGroupDescriptionBox (”sgpd“) {grouping_type = “colr”;entry_count = 3; // = number of sample groupdescription entries// list of three sample group description entries:“Black”“White”“Red”}Sample-to-group box (“sbgp”) {grouping_type = “colr”;entry_count = 5; // = number of entries of thefollowing list// list for all 50 samples:(3,1) // = the first 3 samples of the file are black(10, 3) // = the next 10 samples of the file are red(8,2) // = the next 8 samples of the file are white(20, 3) // = the next 20 samples of the file are white(9,1) // = the last 9 samples of the file are black}
As described above, sample groups are well suited to classify samples into different categories, but they are not well suited when events related to or properties of individual samples need to be described in the file. The main reason for that is that sample groups describe a complete set of samples, and samples that do not belong to a group entry may be member of a “does not belong to any group”-group entry. Another reason is a slow look-up of the sample group a sample belongs to.
An event or property shall be understood as an index for a single sample or a relatively small number of samples. The event or property occurs on an indexed sample, but may influence an arbitrary number of following samples, e.g. random-access-points can be treated as events.
An example for events compared to the above example is “color-change”. If not the color itself, but the change from on color to another has to be indexed, sample groups are not very well suited, because the “sample group” based index has to include also the unwanted information “no color change”. Especially in the case of frequent changes, this may lead to an inefficient index table. Parsing for the events near the end of file tends to be a complex operation, because all sample counts (also that of the “non-event” group samples) have to be summed up.
For trick-play modes (e.g. fast-forward, seeking into the file, etc.) the closest random-access-point to the desired entry point needs to be identified efficiently. Therefore a table of samples this event applies to may be examined for the right entry-point. Random-access-points can exist at multiple levels, so, e.g., first the video decoder configuration is needed in the file and then the closest I-frame of the video track and above of that the multiplex-level entry-point (e.g. the PAT in case of MPEG-2 TS).
An additional problem is that sample groups do not allow the association of a sample to multiple group descriptions. This complicates stacking of events and will not give a compact representation, if sample groups are used for solving this indexing issue.