The present disclosure relates generally to data processing, and more particularly to methods and systems for processing and accessing metadata in a media file.
Media files, such as MPEG-4 files, comprise media data and metadata of the media data. The metadata provides data sample information to media applications for processing media data in the file. Media files are defined to be organized with several structural elements. For example, MPEG-4 files are composed of structural elements called boxes. Each box may comprise media data, metadata or other sub-boxes. For example, a sample table box (STBL) records time information and file information of media data. According to the information recorded in the STBL box, applications can obtain the time, type, data size, and position in the media file and further perform the playback, random-seek or other functions toward the media file, accordingly. The STBL box also includes several sub-boxes comprising a decoding time to sample box (STTS) as shown in FIG. 1, a sample size box (STSZ), a sample to chunk box (STSC) as shown in FIG. 3, a chunk offset box (STCO), a sync sample box (STSS) as shown in FIG. 2, sample description table (STSD), and others.
The STTS box contains at least one entry for recording the time duration of samples of media data. FIG. 1 shows a STIS box 100, comprising 5 entries storing total 45 samples in the media data. It should be noted that data in the STTS box is recorded using the technique of run-length coding to reduce the storage space thereof. That is, the time duration for media sample #1˜190 6, #7˜#20, #21˜#31, #32˜#33, and #34˜#45 is 66, 67, 63, 64, and 66 time units, respectively, and the total time duration of the 45 sample of the media data is 2947 time units.
Traditionally, a linear search through these boxes is performed to help locating a specific sample with a target decoding time. A linear search means that the time duration of respective samples from the very first media sample is accumulated until the accumulated time duration equals or exceeds the target decoding time. For example, to locate a specific sample with a specific decoding time of time unit 2000, a linear search is performed by accumulating the time duration of the first thirty samples. Since the total time duration of the first thirty samples is 1964 (6*66+14*67+10*63=1964) and the total time duration of the first thirty-one samples is 2027 (6*66+14*67+11*63=2027) which exceeds the specific decoding time 2000, sample #30 is then located. The linear search calculation is time-consuming if there is large number of entries in these boxes.
In addition, the media samples in the media data are grouped into chunks. The STSC box records the mapping relationship between samples and chunks. Based on the mapping relationship recorded in the STSC box, one can identify in which chunk a target sample resides, and further obtain other related data using the chunk information. FIG. 3 shows a STSC 300, in which the number of samples in chunk #1˜#2, chunk #3˜#5, chunk #6˜#8, and chunk #9 and the latter is 3, 4, 7, and 6, respectively. Data in the STSC is recorded using the technique of run-length coding to reduce the storage space thereof. To locate a specific sample, a specific chunk corresponding to the specific sample is sought by performing a linear search by accumulating the number of samples from the first chunk. The calculation is also time-consuming if the number of entries in the STSC box is large.