Various data, such as video, music, audio, and texts, are integrally used in the multimedia environment. It is essential to digitalize data of information media, such as newspapers, magazines, televisions, radios, and telephones.
The data transmission rate of these information media is extremely large. A television quality video needs data transmission rate of 80 Mbps, while a telephone quality sound needs data transmission rate of 64 Kbps. During directly transmitting television image and sound data over a telecommunication line, the data transmission rate equivalent of about one thousand regular telephone lines generates. These digitized information media cannot be processed efficiently.
A data compression technology is required. The ISO has been standardizing the MPEG-4, a general multimedia encoding protocol, which allows video transmission at 64 kbps. The detailed description of MPEG-4 data compression technology is omitted here because it does not directly relate to the present invention.
In the MPEG-4 (ISO/IEC 14496-1), video data and audio data are multiplexed into one piece of data. The “MP4 file format” is the provided format for storing MPEG-4 contents. The MP4 file format stores information for synchronously reproducing the video data and audio data. The file format MP4 is designed to reproduce, transfer, maintain, and edit media data (expression data such as encoded video and audio data are hereinafter collectively referred to as “media data”) easily. The MP4 file is reproduced in a system where the MP4 file itself is located, such as in a personal computer, or is converted to a streaming format used in a LAN and delivered in a streaming manner.
The file format MP4 is independent of any particular transmission system. In a server, the MP4 file stores information on characteristic of each transmission system as meta data. The server refers to the meta data which includes the transmission system information of the server. The server obtains information required for transmitting media data.
The media data in the MP4 file can be independently transmitted over various transmission systems.
FIG. 2 is a structure of a “box”, a basic unit of an MP4 file. Each box comprises a size field 101 which identifies a size (number of bytes) of the box, a type field 102 which identifies a type of the box with a string of four characters, and a data field 103 which stores actual data of the box. The data field can store other boxes. The MP4 file has a hierarchal structure of these boxes.
A basic structure of the MP4 file is described in the following in reference to FIGS. 3 to 7.
A structure of a media data of the MP4 file is described in FIGS. 3A to 3C. FIG. 3A is the basic MP4 file structured with one meta data and one piece of media data. The meta data stores information required for decoding or reproducing the media data. A MovieBox (hereinafter referred to as an ‘moov’) 110 is a box which stores all meta data. A MediaDataBox (hereinafter referred to as an ‘mdat’) 111 is a box which stores all the media data.
FIG. 3B shows a structure inside the mdat. The mdat is structured with a sequence of a data unit called a “chunk.” In FIGS. 3A to 3C, different types of media are arranged one after another (audio chunk (1)AC1, video chunk (1)VC1, audio chunk (2)AC1, video chunk (2)VC2, and so on). The MP4 file format does not define the order of chunks and the number of chunks in the mdat. FIG. 3C is a structure inside each chunk, which is structured with consecutive “samples” of one media type. The sample is a minimum encoded data unit, which can be separated from other samples and represents a data unit equivalent to one frame of video or audio data. In FIG. 3C an audio chunk (1)AC1 includes consecutive audio samples (1)AS1 to (A1)ASA1, and a video chunk (1)VC1 includes consecutive video samples (1)VS1 to (V1)VSV1. The MP4 file format does not define the number of samples in each chunk.
FIG. 4 shows a structure inside the moov. The boxes of FIG. 4 are defined as required boxes for the MP4 file format. Other type boxes may be optionally stored in the MP4 file. The moov 110 is structured with a MovieHeaderBox (hereinafter, an ‘mvhd’) 150 which represents header information, and a plurality of a TrackBox (hereinafter, a ‘trak’). The trak is a box which stores information on each media, such as video, audio, still images, characters, etc., forming a scene of the MP4 file.
FIG. 4 shows the moov structured with a TrackBox (hereinafter, a ‘trak’) 151 which contains audio data, and a trak 152 which contains video data. Each track has a hierarchy made with a number of boxes. The respective traks 151 and 152 include a SampleSizeBox (hereinafter, ‘stsz’) 162, a SampleToChunkBox (hereinafter, ‘stsc’) 163, and a ChunkOffsetBox (hereinafter, ‘stco’) 164. The operations of these boxes are briefly described in the following. Other boxes are detailed in the ISO/IEC 14496-1: 2001. The description of the other boxes is omitted here because they do not relate to the present invention.
FIG. 5 illustrates in detail an stsz 162. The stsz 162 is a box which contains information on a size (number of bytes) of each sample in each media, and is used in accessing a given data piece in an mdat. Each of the traks 151 and 152 contains one stsz 162 as a required box.
FIG. 6 illustrates in detail an stsc 163. The stsc 163 is a box which contains information on samples, such as frames, (the number of samples and a description index of samples) of each chunk in each media, and is used in accessing a given data piece in an mdat. Each of the traks 151 and 152 contains one stsc 163 as a required box.
FIG. 7 illustrates an stco 164 in detail. The stco 164 is a box which contains file offset values of each chunk in each media. Each of the traks 151 and 152 contains one stco 164 as a required box. The stco 164 contains the number of all the audio chunks CT1 (in FIG. 7, M is defined as the number of all the audio chunks) and first offset values (CF1 to CFM) of M number of audio chunks. The first offset value of a chunk means an offset value of a first byte of a first sample which is one of the samples forming a chunk. The first offset values (CF1 to CFM) of the audio chunks are the offset values of the first bytes of the first samples (AS1, ASA1, and so on) of the audio chunks (AC1, AC2, and so on) described in FIGS. 3A to 3C.
The position and size information of a given sample of media in an MP4 file are obtained with reference to the information contained in the stsz 162, the stsc 163, and the stco 164. When one sample is specified according to a sample number, the following identification process is performed. The information in the stsz 162 represents the specified sample size. The stsc 163 represents a number of a chunk which contains the specified sample. The stsz 162 represents a size S from the first sample in the chunk which contains the specified sample to just before the specified sample. The stco 164 represents a first position T of the chunk which contains the specified sample. By adding the first position T to the size S from the first sample in the chunk which contains the specified sample to just before the specified sample, the first position of the specified sample is obtained. As described above, the position and the size of a given sample in a file is obtained by analyzing the stsz 162, the stsc 163, and the stco 164 in each trak (151, 152) such that a given sample data is read out from the file.
In the MP4 file format standardized by the ISO/IEC 1446-1: 2001, in addition to the above-described required box, an optional box which is available when required, or a user-defined box may exist. The vendors of the MP4 file reproducing devices employ various optional boxes in their products. Thus an MP4 reproducing device which supports one optional box may not reproduce correctly MP4 files which contain other optional boxes.
It is common that an application supports a uniquely required box to meet its own specific requirement, and that the file specification of the application adds the unique required box to the required boxes for the respective MP4 file format. Therefore the required boxes of the MP4 file format may vary depending on file specifications of applications.
The Wireless Multimedia Forum (hereinafter, WMF) or the like, which is established to realize a platform for wireless delivery of multimedia contents, defines a streaming delivery specification, the RTFD (Recommended Technical Framework Document) Version 1.1 (hereinafter referred to as RTFD 1.1). The RTFD 1.1 employs the MP4 file format as a storing format of multimedia contents in a delivery server. It strongly recommends some optional required boxes, and defines a WMF-specific box as a required box of the RTFD 1.1-defined MP4 file format.
FIG. 8 illustrates a structure of the RTFD 1.1-defined MP4 file format. In the RTFD 1.1, a FileTypeBox (hereinafter, an ‘ftyp’) 170 is defined as a required box. In addition a UserDataBox (hereinafter may be called a ‘udta’) 172 always exists in the moov 171, and the udta 172 always contains a WMFSetSessionAtom (hereinafter may be called a ‘wmfs’) 173 and a WMFSetMediaAtom (hereinafter may be called a ‘wmfm’) 174, which are both the WMF-specific boxes. These boxes are detailed in the RTFD 1.1. Other boxes of FIG. 8 (mdat 111, mvhd 150, trak 151, trak 152) are the same as described above.
Even MP4 file format-compliance encoded media data (or mdats) may respectively have different box structures contained in moovs. When an application-compliance required box does not exist in an moov of an MP4 file, the application may not reproduce the MP4 file. In this case, the application can reproduce the MP4 file by converting only the moov (by adding the application-compliance required box), without changing the mdat.
In the above-described prior art, when the moov is positioned before the mdat, and when a size of the converted moov increases, the position change of the mdat within an MP4 file is required. Thus after the moov is written, rewriting of the mdat is required. FIG. 23 conceptually illustrates a conventional converting process of an MP4 file. In FIG. 23, an MP4 file comprised of moov 110 and mdat 111 is converted to an MP4 file (defined by the RTFD 1.1) comprised of an ftyp 170, an moov 171, and an mdat 111. In the case of the above-described conversion, first, a sufficient temporary space 180 is made on a memory or disk. The moov 171 is generated based on the ftyp 170 and the mdat 111, and then is copied in the temporary space 180. Next the mdat 111 is unchangingly copied in the temporary space 180. The conversion of the MP4 file completes by deleting the original moov 110 and the original mdat 111. If necessary, the converted MP4 file is moved from the temporary space 180 to a specified space. When the duration of an encoded scene of the MP4 file becomes longer, the mdat size becomes extremely large. Therefore the duration for rewriting the mdat becomes extremely long. Additionally a memory size for a temporary space, which is required for copying the mdat before the conversion of the moov, becomes large.
When the mdat position changes within an MP4 file, the stco in the moov needs to be rewritten. Therefore when the scene duration of the MP4 becomes long, the duration for rewriting stco also becomes long.
To solve the above-described problems, an object of the present invention is to provide a storing device and a file converting device which enable an efficient file conversion of multimedia data file, particularly an efficient file conversion performed by converting meta data format (moov) in an MP4 file format, with less memory space occupation.