1. Field of the Invention
The present invention relates to a method and device for editing a composite content file including a plurality of types of multiplexed media data, and to a reproducing apparatus for the composite content file.
2. Description of the Prior Art
Recently, a composite content file including a plurality of types of multiplexed media data such as video data, audio data or text data has been used in a content delivery service or a streaming broadcasting directed to mobile terminals. One of file formats of the composite content file is the MP4 file format (hereinafter referred to as “MP4”) that is defined in an article of Part 14 of the ISO/IEC 14496 standard.
The system layer of MP4 includes a plurality of mixed types of media (media data) and is provided with a header portion that stores information such as conditions for reproducing the media and a media data portion that stores only a media stream. In this system layer, individual media are stored in a packet so as to be multiplexed in the temporal order. The header portion (moov box) that includes media information as header information of a stored packet and the media data portion (media data box) that includes media data itself are completely separated from each other. In this respect, MP4 is different from a system layer such as MPEG-2 PS or TS.
FIG. 20 is a diagram showing an example of a conventional MP4 file format FT1.
As shown in FIG. 20, a file type box BXA of the MP4 file format FT1 stores information indicating compatibility of the file. A moov box BXB that is a header portion stores information about a reproduction condition of each media data stored in a media data box BXC that will be described later, which includes position information, time information, size information and the like of a media frame. The media data box BXC stores media data such as video data, audio data, text data or the like.
The MP4 file format includes the time information of each media frame, which is not reproduction time but a reproduction time length. In other words, the time information indicates, for example, that a first frame of the video data is reproduced for ◯◯ milliseconds and a second frame is reproduced for ΔΔ milliseconds. Therefore, video data is reproduced only by the reproduction time length of the video data, while audio data is reproduced only by the reproduction time length of the audio data.
A user of a mobile terminal can receive delivery of such a composite content file of the MP4 file format by his or her mobile terminal and reproduce the file. However, a maximum size of content that a mobile terminal can handle depends on a type of the mobile terminal. Therefore, if a size of the content exceeds the maximum size of content that the mobile terminal can handle, a server that delivers the content is required to divide the content into a plurality of files (composite content files), while the mobile terminal is required to reproduce the plurality of files continuously.
As a device that divides an MMS message having a size above a transmission capacity of a server into files having a size that the mobile terminal can transmit, there is proposed a device that is described in U.S. patent application publication No. 2005/0054287. The device disclosed in the publication includes a receiving portion that receives an input such as an image signal, an audio signal and the like, a control portion that controls individual portions of the mobile terminal and encodes the image signal and the audio signal received via the receiving portion into multimedia data, which is divided into a specific size and stored as the divided data in an designated order, a buffer that stores the multimedia data and the divided data as individual files, a memory portion that stores the individual files stored in the buffer by the control portion in corresponding areas in accordance with the order, an output portion that delivers operational information of the mobile terminal, the image signal or the audio signal in accordance with the control portion, and a radio frequency portion that transmits the files stored in the memory portion by wireless.
However, in the conventional method, since the synchronizing information of each media data is not stored as information about the reproduction condition of each of the divided media data, there may be a problem as follows.
FIGS. 21 and 22 are diagrams showing examples of the method for dividing the media data.
Positions on a time base that divide each media data included in the content (hereinafter referred to as “division points”) are usually based on the video data and are usually positions of boundaries between pictures of the video data as shown in FIG. 21. The reason is that division of the video data should be performed so that an I-picture frame that can be reproduced by itself becomes a head of the video data after division, and therefore division points depend on positions of the I-picture necessarily.
In this case, therefore, if a filing section of an elementary stream is designated for example, it is necessary to include data of the designated section completely. As a result, a section of each media has a range little wider than that including it.
However, there is very little possibility that the division point of the video data decided as described above matches a boundary between audio frames completely. Therefore, as shown in FIG. 21, the audio data is divided at a boundary between frames that is closest to the position corresponding to the division point of the video data.
In this case, when the individual media data divided as described above are reproduced, if heads of media data at the reproduction start are justified, a reproduction timing of the audio data is delayed from that of the video data by shift time T1 of the division point in a second file. In addition, reproduction end timings of the individual media data are shifted in a first file and the second file.
Such a shift of timing causes an uncomfortable feeling that the user may have, which includes a situation that a motion of the picture does not match the sound, a situation that the sound is still ringing even when the picture is finished, a situation that a sound is interrupted, and the like.
In addition, a method may be considered for matching positions on a time base between the video data and the audio data when they are reproduced, in which each media data is divided at boundaries between audio frames as shown in FIG. 22.
In this case, however, the video data is to be divided in the I-picture frame. Then, since each of the two divided I-picture frames has to be reproducible by itself, a complete I-picture frame is to be used for each of them.
Therefore, when the division as described above is performed, each of the first and the second files includes the I-picture having a large size so that reproduction time of content included in one file becomes short.