The Motion Picture Experts Group (MPEG) has standardized the ISO base media file format which specifies a general file format that serves as a base to a number of more specific file formats, such as the 3GP file format. The file structure is object-oriented and a file is formed by a series of objects called boxes. The structure of a box is inferred by its type. Some boxes only contain other boxes, whereas most boxes contain data. All data of a file is contained in boxes.
A file can be divided into an initial track, contained in a movie box of type ‘moov’, and a number of incremental track fragments, contained in movie fragment boxes of type ‘moof’. Each track fragment extends the multimedia presentation in time. The movie box and the movie fragment boxes are metadata boxes containing the information needed by a user terminal or client to decode and render the media presentation. The actual media data is stored in mediadata boxes of type ‘mdat’.
Track fragments make it possible to distribute the metadata into multiple chunks and thereby avoid the situation where the complete file structure has to be known by the client at the start of playback or rendering. As long as metadata is delivered as a sequence of track fragments, these track fragments can be created live during the transmission, and/or chosen between different versions with different bitrates.
Seeking in ISO/3GP files depends on the structure. When track fragments are used, seeking positions is very difficult since the client can only move one track fragment ahead at a time. The reason for this that the client needs to know the length of a track fragment to find the start of the next track fragment.
Furthermore, when a media track is split up into fragments, it is unlikely that the track fragments will be perfectly aligned. This is at least partly due to different sampling frequencies. For example, video may be sampled at a rate of 30 frames per second, i.e. 33 ms between frames, and audio may be grouped into units each 20 ms long. Even in this simple example, it is obvious that the track fragments will be rarely aligned.
This becomes problematic in random access, when playback of a clip does not start at the beginning, e.g. after a seek. A track fragment may be requested, but will be played out of synchronization since the client has no way of knowing about the differences in track fragment lengths due to the lack of knowledge of the previous track fragment lengths.
A trivial solution to this problem could be to give track fragments an explicit time stamp. Then the timing relationship between track fragments would be well known. But this would remove an important property from track fragments—the fact that they do not have explicit time stamps makes the insertion of commercials, splicing of programs, etc. very simple. In fact, the addition of explicit time stamps to track fragments would put constraints on how track fragments can be used that are not present today.
There is therefore a need in the art of an efficient solution to enable seeking and random access to a stream of media content which has been fragmented and defined by a sequence of track fragments but still achieving synchronized rendering. In particular, there is a need for such a solution that does not require the use of explicit time stamps to the track fragments.