1. Field of the Invention
The present invention relates to a multiplexer that multiplexes media data such as video data, audio data and the like and a demultiplexer that reads and demultiplexes a bit string where media data such as video data, audio data and the like are multiplexed.
2. Description of the Related Art
The recent increase in capacity of a communication network and the development of a transmission technique has remarkably popularized the online video distribution service of distributing a video file of a multimedia content including video, audio, text, a still picture and the like to a personal computer. Also, the third generation partnership project (3GPP) that is an international standardization group which has an object to standardize the standards of the so-called third-generation mobile communication systems such as mobile terminals are seen making a movement of defining the transparent end-to-end packet switched streaming service (TS26.234) as a standard related to a wireless video distribution, and the video distribution service is expected to be further provided to mobile communication terminals such as mobile phones and PDAs.
When distributing a video file in the video distribution service, a multiplexer reads media data such as a video, a still picture, audio, text and the like and multiplexes header information necessary for playing back the media data and the entity data of the media data so as to generate video file data. As a multiplex file format of this video file data, an MP4 file format is focused on.
This MP4 file format is the multiplex file format which is under standardization by the international standardization organization/international engineering consortium (ISO) JTC1/SC29/WG11 that is the international standardization group and expected to become widely spread because it is also employed by the TS26.234 of the above-mentioned 3GPP.
Here, the data structure of the MP4 file will be explained.
The MP4 file stores the header information and the entity data of media data on a basis of an object called box and is made up of plural boxes that are arranged hierarchically.
FIG. 1 is a diagram for explaining the structure of a box included in a conventional MP4 file.
The box 901 is made of a box header part 902 where the header information of the box 901 is stored and a box data storage part 903 where data included in the box 901 (such as a sub-box of the box and a field for describing the information) is stored.
This box header part 902 has fields of a box size 904, a box type 905, a version 906 and a flag 907.
The box size 904 is the field describing the size information of the whole box 901, including the byte size assigned for this field.
The box type 905 is the field describing the identifier for identifying the type of the box 901. This identifier is generally presented by four alphabet strings. Note that there are cases where each box is shown by using this identifier in this specification.
The version 906 is the field where a version number showing the version of the box 901 is described, and the flag 907 is the field describing flag information that is set for each box 901. This version 906 and the flag 907 are not always necessary for all boxes 901, and a box 901 that does not have these fields may exist.
The MP4 file made of a series of boxes 901 that has this structure can be broadly divided into a basic part that is essential in the file structure and an extension part that is used as a need arises. First, the basic part of the MP4 file will be explained.
FIG. 2 is a diagram for explaining the basic part of a conventional MP4 file.
The basic part 911 of the MP4 file 910 is made of a file header part 912 and a file data part 913.
The file header part 912 is the part where header information of the whole file such as the information on a video data compression coding method and the like of video data is stored and is made of a file type box 914 and a movie box 915.
The file type box 914 is a box identified by the identifier “ftyp” and stores the information for identifying the MP4 file. As the standardization group or a service provider can arbitrarily prescribe which media data is stored in the MP4 file and which compression coding method is used for the video data, the audio data and the like that is stored in the MP4 file, the information for identifying the prescription according to which the MP4 file is generated is stored in this file type box 914.
The movie box 915 is the box identified as the identifier “moov” and stores header information of the entity data stored in the file data part 913 such as a display duration.
The file data part 913 is made of a movie data box 916 identified as the identifier “mdat”. Note that it is also possible to refer to an external file that is different from this MP4 file 910 instead of this file data part 913. In this way, in the case of referring to the external file, the basic part 911 of the MP4 file 910 is made essentially of the file header part 912. In this specification, the case where entity data is included in the MP4 file 910 will be explained, not the case where this external file is referred to.
The movie data box 916 is a box for storing the entity data of the media data on a basis of a unit called sample. This sample is a smallest access unit in the MP4 file and corresponds to a video object plane (VOP) of the video data coded in a compression coding method of the moving picture experts group 4 visual (MPEG) or a frame of the audio data.
Here, the lower hierarchy in the structure of the movie box 915 in the basic part of a conventional MP4 file will be explained.
FIG. 3 is a diagram for explaining the structure of the movie box in the conventional MP4 file.
As shown in FIG. 3A, the movie box 915 is made of the box header part 902 and the box data storage part 903 that have already been explained. And, the size information of the movie box 915 is described (“xxxx” in FIG. 3A) in the field of the box size 904 that constitutes the box header part 902, and the identifier “moov” of the movie box 915 is described in the field of the box type 905.
Also, the movie header box 917 where the header information of the basic part 911 of the MP4 file 910 is stored or the track box 918 where the header information for each track such as the video track and the audio track is stored in the box data storage part 903 of the movie box 915. Note that a track here means the whole sample data of each media included in the MP4 file 910, and the track of a video, audio, a text or the like is called as a video track, an audio track, a text track or the like respectively. Also, in the case where a plurality of data of the same media are included in the MP4 file 910, a plurality of tracks exist in the same media. Specifically explaining, in an example case where two types of video data are included in the MP4 file 910, two video tracks exist.
The movie header box 917 is made of the box header part 902 and the box data storage part 903 that have already been explained, the size information of the movie header box 917 is described (“xxx” in FIG. 3A) in the field of the box size 904 that constitutes the box header part 902, and the identifier “mvhd” of the movie header box 917 is described in the field of the box type 905. And, information on the duration needed for playing back the content included in the basic part 911 of the MP4 file 910 and the like is stored in the box data storage part 903 of the movie header box 917.
Also, the size information of the track box 918 (“xx” in FIG. 3A) is described in the field of the box size 904 that constitutes the box header part 902 of the track box 918, the identifier “track” of the track box 918 is described in the field of the box type 905. And, the track header box 919 is stored in the box data storage part 903 of the track box 918.
The track header box 919 is the box that has a field for describing the header information for each track and is identified by the identifier “tkhd”. The field for describing a track ID for identifying the track type or the information on the duration needed for playing back the track is described in the box data storage part 903 of this track header box 919.
In this way, boxes 901 are arranged hierarchically in the movie box 915, and header information for each track for a video, audio or the like is stored in the track box 918 that can be identified by “trak”. And, header information on a basis of a track sample is stored in the lower boxes included in this track box 918.
When showing the structure of the movie box 915 shown in FIG. 3A as a tree, a diagram like FIG. 3B can be obtained.
In other words, it is shown that a movie header box 917 and a track box 918 are arranged as a group of lower boxes of the movie box 915, a track header box 919 is arranged as a group of lower box of the track box 918, and boxes 901 are arranged hierarchically.
At the initial stage of standardizing the MP4 file format, the MP4 file 910 is made essentially of the above-mentioned basic part 911. However, the increase in the information amount of media data entails the increase in the file size, which produces various problems such as the difficulty in the application for streaming playback, and thus an improvement of additionally using an extension part where a plurality of combinations of a header box and a data box are serially arranged.
FIG. 4 is a diagram showing the structure of a conventional MP4 file including an extension part.
As shown in FIG. 4, the MP4 file 920 to which the above-mentioned improvement is added is made of a basic part 911 and an extension part 921. The MP4 file 920 including this extension part 921 can store all of the media data in the extension part 921, it is possible to omit the movie data box 916 of the MP4 file basic part 911.
The extension 921 is made of a plurality of packets 922 that is divided on a basis of predetermined part.
This packet 922 is made of a pair of a movie fragment box 923 and a movie data box 916, and also called as movie fragment.
The movie data box 916 stores a sample for each track on a basis of the above-mentioned predetermined part. The movie fragment box 923 is the box for storing the header information corresponding to this movie data box 916 and identified by the identifier “moof”. The structure of this movie fragment box 923 will be explained more specifically.
FIG. 5 is a diagram for explaining the structure of a conventional movie fragment box.
As shown in FIG. 5, a movie fragment header box 924 and a plurality of track fragment boxes 925 are stored in the box data storage unit 903 of the movie fragment box 923.
The movie fragment header box 924 is the box identified by the identifier “mfhd” and stores the header information of the whole movie fragment box 923.
The track fragment box 925 is the box identified by the identifier “traf” and stores the header information for each track.
Note that a single track fragment box 925 is generally prepared for the header information of a single track, but it is also possible to prepare a plurality of track fragment boxes 925 for a single track header information. In this way, when a single track header information is divided into a plurality of track fragment boxes 925 so as to be stored, decoding time of the leading sample of the track fragment box 925 is arranged in an ascending order.
After that, a track fragment header box 926 and one or more track fragment run box 927 are stored in the box data storage part 903 of this track fragment box 925.
The track fragment header box 926 is the box identified by the identifier “tfhd” and stores a field for describing the track ID for identifying the type of a track or information on the default value such as the playback time of a sample and the like.
The track fragment run box 927 is the box identified by the identifier “trun” and stores the header information on a basis of a sample. This track fragment run box 927 will be explained with reference to FIG. 6.
FIG. 6 is a diagram for explaining the structure of a conventional track fragment run box 927.
The flag 907 is the field describing flag information set for each box 901, here the flag information showing whether each field from the data offset 929 to the sample composition time offset 936 is included in the track fragment run box 927 next to the flag 907.
The sample count 928 is the field describing the information showing the number of header information items concerning the sample is stored in the track fragment run box 927.
The data offset 929 is the field describing the pointer information showing in which part of the movie data box 916 paring with the entity data of the sample placed at the leading part of the track fragment run box 927 among the samples whose header information items are stored in the track fragment run box 927.
The leading sample flag 930 is the field where the value of the filed of the later-explained sample flag 935 is overwritten in the case where the leading sample of the track fragment run box 927 is a randomly-accessible sample. Here, the random access means the processing operation of moving the playback location of data in the middle of the playback to the location 10 minutes later or starting the playback from the point in the middle of the data in a playback apparatus of the MP4 file. In addition, the randomly-accessible sample is the sample, among video samples, that constitutes a frame that can be solely decoded without referring to other frame data, that is an intra coded frame (so-called an intra frame) in the playback apparatus of the MP4 file. Note that all the audio samples are the samples that are randomly accessible because all of the audio samples can be solely decoded.
The table 931 is the one where the same number of entries 932 showing the header information items for respective samples as the number of entries shown in the sample count 928 is integrated.
The entry 932 is a collection of fields showing header information items for respective samples, and the included field is indicated by the above-mentioned flag 907. Fields included in the entry 932 includes a sample duration 933 describing a sample playback duration, a sample size 934 describing a sample size, a sample flag 935 describing the flag information indicating whether the sample is randomly accessible or not, and a sample composition time offset 936 describing the differential value between the sample decoding time and the sample display time in order to handle samples using an interactive prediction.
Note that, these fields are not included in the entry 932, as default values of these fields are described in the track fragment header box 926 or the movie extend box (identifier “mvex”) in the movie fragment box 915, these default value of the fields are used for each of the sample header information items.
Also, the header information items of samples are described in the track fragment run box 927 in the order of decoding time. Therefore, at the time when the apparatus that plays back the MP4 file searches the sample header information items, referring to track IDs in the track fragment header box 926 starting from the track fragment box 925 that is the leading box in the file means searching the track fragment box 925 including the header information item of the track to be obtained and searching the header information of a sample starting from the track fragment run box 927 that is the leading box in the track fragment box 925.
Note that, in the case of the MP4 file 920 including this extension part 921, the information necessary for the whole track such as the initial information at the time of decoding is stored in the movie box 915.
Next, the structure example of the MP4 file including the extension 921 having the structure like this will be explained.
FIG. 7 is a diagram showing the structure example of the extension part of the MP4 file including the conventional extension part.
In FIG. 7, the storage method of a content will be explained showing two examples, and the content playback duration is 60 seconds.
The MP4 file 940 shown as FIG. 7A has the structure of storing media data in both the basic part 941 and the extension part 942. In other words, a part of the media data from 0 to 30 seconds is stored in the mdat_1 (code 945) of the basic part 941, a part of the media data from 30 to 45 seconds is stored in the mdat_2 (code 947) of the extension part 942, and a part of the media data from 45 to 60 seconds is stored in the mdat_3 (code 949). In addition, the header information of mdat_1 (code 945) is stored in moov 944, the header information of mdat_2 (code 947) is stored in the moof_1 (code 946) and the header information of mdat_3 (code 949) is stored in the moof_2 (code 948).
In contrast, the MP4 file 950 shown in FIG. 7B has the structure of storing the media data in the extension part 952 only. In other words, the basic part 951 is made of ftyp 953 and moov 954 and does not include any mdat, a part of media data from 0 to 30 seconds is stored in mdat_1 (code 956) in the extension part 952, and a part of the media data from 30 to 60 seconds is stored in mdat_2 (code 958). In addition, the header information of mdat_1 (code 956) is stored in moof_1 (code 955), and the header information of mdat_2 (code 958) is stored in moof_2 (code 957).
Here, how the extension part of the above-mentioned MP4 file is generated will be explained with reference to FIG. 8 to FIG. 10.
FIG. 8 is a block diagram showing the structure of the conventional multiplexer.
The multiplexer 960 is an apparatus that multiplexes the media data and generates the extension part data of the MP4 file. Here, the extension part data of the MP4 file is generated by multiplexing video data and audio data.
The first input unit 961 captures video data in the multiplexer 960 and has the first data storage unit 962 store the video data. Also, the second input unit 964 captures audio data in the multiplexer 960 and has the second data storage unit 965 to store the audio data.
The first analysis unit 963 reads out samples of video data items one by one from the first data storage unit 962 so as to analyze them and outputs the header information items of the video samples to the packetization part determination unit 967. Also, the second analysis unit 966 reads out samples of audio data one by one from the second data storage unit 965 so as to analyze them and outputs the header information items of the audio samples to the packetization part determination unit 967. The header information items of video samples and the header information items of audio samples include the information indicating the size or the playback durations of the samples, and the header information items of video samples include the information items showing whether the video samples are intra frames or not.
The packetization part determination unit 967 determines the packetization part of the video data and the audio data so that the number of samples included in the packet become constant and generates the header information items of the respective packets based on the obtained sample header information items.
FIG. 9 shows the processing operation flow of the conventional packetization part determination unit. Here, the number of samples stored in a packet is N, and the predetermined number of N is stored in a memory or the like of the multiplexer 960.
First, when the first analysis unit 963 obtains a video sample (S901) and outputs the video sample header information to the packetization part determination unit 967, the packetization part determination unit 967 adds a video sample header information to a packet generation table (S902).
Next, the packetization part determination unit 967 updates the number of video samples included in the packet (S903) and judges whether the number of the video samples included in the packet becomes N or not (S904).
Here, the above-mentioned processing from S901 to S903 is repeated in the case where the number of video samples included in the packet does not reach N (No in S904), and the packetization part determination unit 967 packetizes N video samples to finish the processing operation (S905).
Likewise, the packetization part determination unit 967 packetizes the audio samples by performing the processing operation of the above-mentioned S901 to S905.
After that, the packetization part determination unit 967 repeats the processing operation of this flow until all the samples have been packetized.
FIG. 10 shows an example of the packet generation table that stores the header information items of the conventional video samples. This packet generation table 968a describes, for each of the video samples, the sizes of samples, the sample playback durations, or the information related to the intra coded frame flags showing whether the video samples are intra frames or not. Here, the leading video sample stored in the packet shows that the size is 300 bytes, the playback duration is 30 ms, and that it is not the intra coded frame. And, the second video sample shows that it is the intra coded frame. In addition, this packet generation table 968a is outputted to the packet generation table storage unit 968 at the time when these information items are added in sequence in the packetization part determination unit 967 until “N”th sample that is the sample included in a packet is generated.
Referring to FIG. 8 again, next, the packetization part determination unit 967 describes the header information items of N samples in the packet generation table 968a, and then it outputs the packet generation table 968a to the packet generation table storage unit 968 and a packet generation signal to the packet header generation unit 969.
The packet header generation unit 969, when obtaining the packet generation signal, reads out the packet sample header information from the packet generation table 968a that is held in the packet generation table storage unit 968 and generates moof data. Also, the packet header generation unit 969 outputs the generated moof data to the packet connection unit 971 and outputs, to the packet data generation unit 970, the mdat information including i) pointer information indicating which parts of the first data storage unit 962 and the second data storage unit 965 store the entity data items of samples included in the packet and (ii) the size information items of samples.
The packet data generation unit 970 reads out the entity data items of samples from the first data storage unit 962 and the second data storage unit 965 based on the obtained mdat information so as to generate mdat data and outputs the mdat data to the packet connection unit 971.
After that, the packet connection unit 971 connects the moof data with the mdat data so as to output the data in the mp4 extension part for a single packet.
Finally, the outputted mp4 extension data for a single packet is captured into an apparatus that generates the MP4 file and the data of the mp4 extension part that is generated in sequence are arranged in sequence so that the extension part of the MP4 file is generated. After that, this file generation apparatus connects the basic part with the extension part of the MP4 file so as to generate an MP4 file.
However, at the time when the extension part of the MP4 file that is multiplexed by the conventional multiplexer like this is played back, there are problems listed below.
As a conventional demultiplexer multiplexes data without considering the playback start time of samples included in the packet, there is a case where an audio sample that is synchronized with the video sample which has certain playback time is stored in a packet that is different from the packet in the case of video samples. Therefore, this is the cause of a problem that the efficiency of the data access in playing back an MP4 file by the playback apparatus deteriorates.
Also, as a conventional multiplexer multiplexes data based on the number of samples included in a packet, randomly-accessible samples, that is, video samples corresponding to intra frames are respectively stored in a different part of the packet, packet by packet in most cases. Therefore, there is a problem that the calculation amount needed for searching samples becomes huge because the MP4 file playback apparatus must search all the video samples included in a packet when searching randomly-accessible samples.
These problems will be explained in detail with reference to FIG. 11.
FIG. 11 is a diagram for explaining problems of a conventional multiplexer.
FIG. 11A illuminates the first problem that the efficiency of the data access deteriorates during the playback.
The header information items of samples included in respective mdat are stored in each moof immediately before each mdat, the header information item concerning the video sample of playback start time 20s stored in mdat_1 is stored in moof_1 as the leading sample and the header information item concerning the audio sample of the playback time 20s stored in mdat_10 is stored in moof_10 as the last sample.
Therefore, the MP4 file playback apparatus must search data up to moof_10 during the time period of obtaining the header information items of video samples stored in moof_1 to obtain the header information items of audio samples when trying to play back the part of 20 seconds in the playback time of a content, which makes the efficiency of the data access deteriorate.
FIG. 11B illuminates the second problem that the calculation amount needed for searching randomly-accessible samples becomes huge.
The header information item concerning the “i”th randomly-accessible video sample stored in the last part of the mdat_1 is stored as the last sample in moof_1, and the header information concerning the “i+1”th randomly-accessible video sample that is stored in the last part of the mdat_3 is stored as the last sample in moof_3.
Therefore, the MP4 file playback apparatus must search up to the last sample of moof when trying to perform random access, and thus the calculation amount necessary for searching becomes huge.
Further, in addition to the first and the second problems, as the number of seeks for obtaining the sample data becomes many under the structure of the extension part of the MP4 file that is generated in the conventional multiplexer, there is another problem that this is not appropriate for the random access playback in an apparatus which has a slow seek speed such as an optical disc playback apparatus.
This problem will be explained with reference to FIG. 11B again. In the case of trying to perform random access to the “i”th randomly-accessible video sample of moof_1, the playback apparatus moves a reading pointer to the leading point of moof_1 in order to obtain the header information item of the “i”th randomly-accessible video sample first and then analyzes data in moof_1 in sequence. At this time the first seek becomes necessary.
After that, the playback apparatus obtains the information as to which part of mdat_1 stores the entity data of the “i”th randomly-accessible video sample and moves the reading pointer to the starting position of the entity data. At that time, as the entity data of the “i”th randomly-accessible video sample is stored in the end of mdat_1, it is impossible to obtain the entity data of a sample by moving the reading pointer in sequence from the leading position of moof_1, and thus the second seek becomes necessary.
In other words, as respective seek operations are performed at the time of moving the reading pointer to the leading location of moof_1 and to the starting position of the entity data, it takes a lot of time to perform random access playback in the case where the playback apparatus has a slow seek speed. Especially, in the case where the entity data item of an audio sample or the like that is synchronized with the “i”th randomly-accessible video sample is stored in a place such as a different packet away from the entity data of the video sample, additional seek operation becomes necessary and it is impossible to perform an immediate random access playback.
The present invention is conceived considering these problems, and an object of the present invention is to provide a multiplexer which has a high efficiency of data access at the time of playing back a multiplexed media data file and which can multiplex media data so that the calculation amount needed for searching samples can be reduced.
Also, another object is to provide a multiplexer which can multiplex media data so that an apparatus with a slow seek speed can perform random access playback of a multiplexed file.
Further, another object is to obtain the file multiplexed by the multiplexer and provide a demultiplexer which can demultiplex the multiplexed file.