An MP4 has been focused on as a file format for multiplexing and storing multi media data (that is a content of video, audio, text, still picture or the like) in these days.
The applicable range of the MP4 is expanding, for example, it is employed as a file format of MPEG-4 that is a video compression coding method or JPEG 2000 that is a next generation version of JPEG that is a natural picture compression coding method.
This MP4 file format was initially composed of a data box where multiplexed content is stored and a header box where information on the content of the data part is stored.
Therefore, a multiplexing apparatus for generating MP4 file data from multi media data obtains multi media data, performs coding and multiplexing, and makes a header box to complete the MP4 file data after finishing processing on all the multi media data.
This multiplexing apparatus needs a lot of time to make the header box after finishing obtaining and processing multi media data when the information amount of the multi media data is a lot.
For example, in the case of a multiplexing apparatus being used as a cellular phone with a video function, a time lag occurs at the time of receiving a call while videotaping because of making a header box where information on all the videotaped contents are contained. Also, there was a problem because the making process of a header box is cancelled in the midstream and the MP4 file data cannot be completed when giving a priority to receiving a call.
Therefore, the MP4 file format is improved in a way that plural pairs of a data box and a header box are connected to each other.
FIG. 1 is a data block diagram showing a basic file format of the conventional MP4 file data that is improved as mentioned above.
This MP4 file data 900 is composed of a non-fragment data part 900a and a fragmented data part 900b. 
Non-fragment data part 900a comprises a movie box 910 as the above-mentioned header box (called “moov” from here) and a media data box 920 as the above-mentioned data box (called “mdat” from here).
The moov 910 is further composed of a movie header box 911 (called “mvhd” from here), a plurality of track box 912 (called “trak” from here), and a movie extend box 913 (called “mvex” from here).
The fragmented data part 900b is made so that plural pairs of a movie fragment box 930 as the above-mentioned header box (called “moof” from here) and a media data box 940 as the above-mentioned data box (called “mdat” from here) are connected to each other.
The moof 930 is further composed of a movie fragment header box 931 (called “mfhd” from here) and a plurality of track fragment box 932 (called “traf” from here).
FIG. 2 is a data block diagram showing the detailed structure of the moov 910.
The mvhd 911 contains a movie duration field 911a for storing non-fragment content duration information indicating the duration needed for playback of the content contained in the non-fragment data part 900a (the non-fragment content duration).
The trak 912 further contains a track header box 914 (called “tkhd” from here), and the tkhd 914 contains a track ID field 916 for storing track identification information for identifying a track and a track duration field 917. Also, this track duration field 917 stores non-fragment track duration information for indicating the duration needed for playback of a track contained in the non-fragment data part 900a (the non-fragment track duration) indicated by the track identification information.
A track used here is a kind of contents and means video, audio, text or the like.
The mvex 913 contains track extend box 915 (called “trex” from here) in proportion to the number of tracks contained in the fragmented data part 900b. 
This trex 915 contains a track ID field 918 for storing track identification information for identifying a track and the first default duration field 919 for storing a default sample duration information indicating the duration preset as a default to the track sample of the fragmented data part 900b specified by the track identification information.
A sample used here is a minimum unit of an MP4 file format. A sample means a frame or a picture when the track is video, while a sample means audio information, for example, of 20 msec when the track is audio.
FIG. 3 is a data block diagram showing the detailed structure of a moof 930.
A traf 932 contained in the moof 930 further contains a track fragment header box 933 (called “tfhd” from here) and a plurality of track fragment run box 936 (called “trun” from here).
This tfhd 933 contains a track ID field 934 for storing track identification information for identifying a track and the second default duration field 935 for storing default sample duration information indicating the preset duration as a default to the sample associated with traf 932 stored in tfhd 933 out of the track sample of the fragmented data part 900b specified by the track identification information. The second default duration field 935 can be omitted here, and the first default duration field 919 is referred to in this case.
Also, trun 936 contains a sample duration field 937 for storing sample duration information indicating the durations of respective samples (sample durations) contained in the tracks of the fragmented data part 900b specified by the above-mentioned track identification information.
In this way, the conventional multiplexing apparatus completes an MP4 file data 900 every time it obtains and processes multi media data and makes use of the effect in a real time recording by making MP4 file format contain a pair of a mdat 920 and a moov 910, and plural pairs of a mdat 940 and a moof 930, in other words, by making a plural pairs of a box storing a multiplexed content and a box storing information concerning the content included in the earlier mentioned box are connected to each other. And, the conventional multiplexing apparatus made as a cellular phone with a videotaping function can avoid the occurrence of a time lag at the time of receiving a call while videotaping.
FIG. 4 is a block diagram showing the structure of the conventional multiplexing apparatus for generating the above-mentioned MP4 file data 900.
This multiplexing apparatus 700 comprises a video data analysis unit 701, an audio data analysis unit 702, the first selector switch 703, the second selector switch 704, a control unit 708, a moof generation unit 705, a moov generation unit 706, and a file generation unit 707.
The video data analysis unit 701 generates video specification information indicating the time stamp or the size of the video data by obtaining and analyzing the video data. After that, the video data analysis unit 701 codes the video data, makes video coded data, and outputs the video coded data and the video specification information.
The audio data analysis unit 702 generates audio specification information indicating the time stamp or the size of the audio data by obtaining and analyzing the audio data. After that, the audio data analysis unit 702 codes the audio data, makes the audio coded data, and outputs the audio coded data and the audio specification information.
The moov generation unit 706 generates and outputs the data (moov data) stored in a moov 910 according to the control from the control unit 708.
The moof output unit 705 generates and outputs the data (moof data) stored in a moof 930 according to the control from the control unit 708.
The control unit 708 obtains the video specification information from the video data analysis unit 701 and the audio specification information from the audio data analysis unit 702, and outputs the data to be stored in the non-fragment data part 900a that contains the moov data from the second selector switch 704 first and the data to be stored in the fragmented data part 900b that contains the moof data next by switching the first selector switch 703 and the second selector switch 704 based on the video specification information and audio specification information.
The file generation unit 707 performs multiplexing processing on the respective data outputted from the second selector switch 704, and generates and outputs an MP4 file data 900.
FIG. 5 is a flow chart showing the operation concerning the generation of the moov data and the moof data of the conventional multiplexing apparatus 700.
First, the multiplexing apparatus 700 generates moov data (step S100)
Next, the multiplexing apparatus 700 generates moof data (step S102).
After that, the multiplexing apparatus 700 judges whether or not there remains unprocessed data in the inputted video data and audio data to be coded and multiplexed (step S104). When the multiplexing apparatus 700 judges that there remains unprocessed data (step S104: Yes), it repeatedly exerts the operation from the step S102 while it finishes the processing when it judges that there remains no unprocessed data (step S104: No).
In this way, the conventional multiplexing apparatus 700 using the MP4 file data 900 is effective for a real time recording because it makes moof data in order.
By the way, the above-mentioned MP4 file data 900 is suitable for a streaming because it is made in a way that plural pairs of a data box and a header box are connected to each other as shown in FIG. 1˜FIG. 3. In other words, the demultiplexing apparatus for playing back a content can play it back (perform download playback) in order before finishing downloading all the MP4 file data 900 distributed as a stream by obtaining the MP4 file data 900 and performing demultiplexing.
FIG. 6 is a block diagram showing the structure of the conventional demultiplexing apparatus for playing back a content based on the above-mentioned MP4 file data 900.
This demultiplexing apparatus 800 obtains and plays back the MP4 file data 900, and comprises a data obtainment unit 810 operable to obtain the MP4 file data 900, a decoding unit 813 operable to demultiplex and decode the MP4 file data 900 obtained in the data obtainment unit 810, a time processing unit 811 operable to perform time processing based on the obtained MP4 file data 900, a random access processing unit 812 operable to perform a random access processing on the obtained MP4 file data 900, a playback unit 814 operable to output video and audio based on the data outputted from the decoding unit 813.
Also, the time processing unit 811 has a duration specification unit 811a operable to specify the duration needed for playback of part of the content contained in the MP4 file data 900 and a playback time specification unit 811b operable to specify playback time. Playback time used here means the time needed for playback from the starting time to the present time when performing playback from the head of the content.
FIG. 7 is a block diagram showing the internal structure of the duration specification unit 811a. 
The duration specification unit 811a has the first demultiplexing unit 821, the second demultiplexing unit 822, the third demultiplexing unit 823, the first analysis unit 824, and the second analysis unit 825.
The first demultiplexing unit 821 demultiplexes the moov data from the MP4 file data 900 and outputs it upon obtaining the MP4 file data 900.
The second demultiplexing unit 822 demultiplexes the data stored in mvhd911 (mvhd data) and the data stored in trak912 (trak data) from the moov data and output them upon obtaining the moov data.
The third demultiplexing unit 823 demultiplexes the data stored in tkhd914 (tkhd data) from the trak data and outputs it upon obtaining the trak data.
The first analysis unit 824 analyses the mvhd data and outputs the non-fragment content duration information stored in the movie duration field 911a upon obtaining mvhd data from the second demultiplexing unit 822.
The second analysis unit 825 analyses the tkhd data and outputs the track identification information stored in the track ID field 916 upon obtaining the tkhd data from the third demultiplexing unit 823, and it also outputs the non-fragment track duration information stored in the track duration field 917 according to the track identification information.
The duration specification unit 811a like this outputs the non-fragment content duration information to the playback unit 814, and it also outputs, to the playback unit 814, the track identification information and the non-fragment track duration information corresponding to the track of the non-fragment data part 900a indicated by the track identification information.
After that, the playback unit 814 may display the non-fragment content durations or the non-fragment track durations for respective tracks as the need arises when obtaining the non-fragment content duration information, the track duration information and the track identification information. Also, the playback unit 814 displays the playback time performing playback of the video or the audio based on the specification result by the playback time specification unit 811b. 
FIG. 8 is a block diagram showing the structure of the random access processing unit 812.
The random access processing unit 812 searches a sample corresponding to a target time as a target sample from the saved MP4 file data 900 upon obtaining the target duration information indicating the target time from outside, and comprises a data storage unit 830a, the first demultiplexing unit 830, the second demultiplexing unit 831, the third demultiplexing unit 832, the fourth demultiplexing unit 835, the fifth demultiplexing unit 834, the first analysis unit 836, the second analysis unit 837, the third analysis unit 838, a track control unit 833, a determination unit 839 and a control unit 840. The target time used here means playback starting time when a user tries to perform playback from the middle of the content. For example, when a user tries to perform playback of the ten-minute content in total duration without viewing 2 minutes from the start, the target time is 2 minutes. The data storage unit 830a stores the MP4 file data 900 stored in the data obtainment unit 810.
The first demultiplexing unit 830 demultiplexes the moov data and the moof data from the MP4 file data 900 and output them upon obtaining the MP4 file data 900 from the data storage unit 830a. 
The third demultiplexing unit 832 demultiplexes data (mvex data) stored in the mvex 913 from the moov data and outputs the mvex data upon obtaining the moov data.
The track control unit 833 outputs the track identification information based on the moov data upon obtaining the moov data from the first demultiplexing unit 830.
The second demultiplexing unit 831 demultiplexes the data (traf data) stored in traf 932 corresponding to the track specified by the track identification information from the moof data and outputs the traf data upon obtaining the track identification information from the track control unit 833.
The fourth demultiplexing unit 835 demultiplexes the data (tfhd data) stored in the tfhd 933 and the data (trun data) stored in the trun 936 from the traf data and outputs them upon obtaining the traf data from the second demultiplexing unit 831.
The fifth demultiplexing unit 834 demultiplexes the data (trex data) stored in the trex 915 corresponding to the track specified by the track identification information from the mvex data and outputs the trex data upon obtaining the mvex data from the third demultiplexing unit 832 and track identification information from the track control unit 833.
The first analysis unit 836 analyses the trex data and outputs the default sample duration information contained in the trex data as the first duration information upon obtaining the trex data from the fifth demultiplexing unit 834.
The second analysis unit 837 analyzes the tfhd data and outputs the sample duration information contained in the tfhd data as the second duration information upon obtaining the tfhd data from the fourth demultiplexing unit 835.
The third analysis unit 838 analyses the trun data and outputs the sample duration information contained in the trun data as the third duration information upon obtaining the trun data from the fourth demultiplexing unit 835.
The determination unit 839 selects a piece of information out of the first duration information obtained from the first analysis unit 836, the second duration information obtained from the second analysis unit 837 and the third duration information obtained from the third analysis unit 838 and then outputs the determined information as the duration information. First, the determination unit 839 gives the first priority to the third duration information, gives higher priority to the second duration information when the third duration information cannot be obtained because of its omission, selects the first duration information when the second duration information cannot be obtained because of its omission, and outputs the selected information.
Also, the determination unit 839 outputs the sample identification information for identifying the sample corresponding to the duration information when outputting the duration information.
The control unit 840 adds durations indicated in the duration information outputted from the determination unit 839 in order upon obtaining the target duration information showing the target time. After that, the control unit 840 performs the above-mentioned addition until the addition result reaches to the target time indicated in the target duration information, and outputs the sample identification information obtained from the determination unit 839 as the target sample identification information when reaching to the target time.
FIG. 9 is a block diagram showing the internal structure of the track control unit 833.
The track control unit 833 comprises the sixth demultiplexing unit 841 operable to obtain the moov data and demultiplex trak data from the moov data, the seventh demultiplexing unit 842 further operable to obtain the trak data and demultiplex tkhd data from the trak data, and an analysis unit 843 operable to specify and output the track identification information by analyzing the tkhd data.
FIG. 10 is an illustration explaining the outline operation of the random access processing unit 812.
As shown in this FIG. 10, the conventional random access processing unit 812 specifies the target sample corresponding to the target time by adding the sample durations in order.
And, the random access processing unit 812 outputs the MP4 file data 900 and the target sample identification information to the decoding unit 813, and has the decoding unit 813 and the playback unit 814 exert the playback of the content from the target sample indicated by the target sample identification information.
The conventional demultiplexing apparatus 800 like this, when obtaining the MP4 file data 900 distributed as a stream, can perform download playback based on this and perform random access to all the stored MP4 file data 900 by having the random access processing unit 812.
However, the above-mentioned conventional demultiplexing apparatus 800 has the problem that it cannot display the total duration of all the contents including contents in the fragmented data part 900b while it can display the durations needed for playback of the contents included in the non-fragment data part when performing download playback of the MP4 file data 900, and thus it is not user-friendly.
FIG. 11 is an illustration explaining the problem in the conventional demultiplexing apparatus 800.
The demultiplexing apparatus 800 made as a cellular phone downloads the MP4 file data 900 distributed as a stream via a base station 990, and plays back the downloaded MP4 file data 900 in order. At that time, the demultiplexing apparatus 800 displays the playback time, but it cannot specify the duration needed for the playback of all the contents until it obtains all the MP4 file data 900. As a result, it cannot display the total duration of the content during its download or playback. For example, a user of the demultiplexing apparatus 800 may have the fear that the user cannot guess the fee in the rate system where fees are charged according to the time spent using the demultiplexing apparatus 800, or a user may have the fear that the power is cut in the middle of the download or the playback when a remaining battery capacity is near exhaustion. In this way, the conventional demultiplexing apparatus 800 cannot let the user know the total duration of the contents, and thus it lacks userfriendliness.
Therefore, the present invention is invented considering the above-mentioned problems, and its purpose is to provide the demultiplexing apparatus and the multiplexing apparatus with an improved userfriendliness at the time of download playback of the content keeping the effectiveness of the download playback and the real time recording.