Digital equipment such as a digital video recorder, digital camera, PDA or mobile-phone has rapidly come into widespread use in recent years, and digitization of television broadcasting has also proceeded. As a result, it has become possible to handle a wide variety of data, from video and audio to still picture and text, as digital format data (“digital data”). Given this sort of background, there is extensive research on multimedia technology that handles a wide variety of data comprehensively. In addition, within multimedia technology as well, digital data compression technology is very important, and within digital data compression technology, MPEG-4 is a standard for generating moving pictures and reproducing interactive media. MPEG-4 can be applied to generate a large variety of moving pictures having different quality, e.g., from a quality for transmitting via a low-speed line to that of high-definition television picture. The ISO (International Organization for Standardization) is proceeding with work on the standardization of MPEG-4. MPEG-4 data compression technology is not directly relevant to the present invention, and therefore a detailed description thereof is omitted here.
As an MPEG-4-compatible file format for storing contents, there is a so-called MP4 file format specified by ISO/IEC 14496-14. The MP4 file format is composed of metadata, which describes information relating to the media, and media data, which is encoded video and audio data. All the data is contained in a data structure called a “Box” (or “atom”).
FIG. 1 is a diagram showing the data structure of a Box. As shown in FIG. 1, a Box 101 is composed of a size field 102, a type field 103 and a data field 104. The size of the entire Box (that is, the number of bytes), including the size field, is contained in the size field 102, and a Box identifier (usually four letters) is stored in the type field 103. Actual header data and media data are stored in the data field 104.
Although the foregoing is the basic data structure of the Box, in addition there may also be a version field (1 byte) and/or a flags field (3 bytes) in front of the data field 104. A Box that has these fields is called a Full Box. The metadata portion that forms the MP4 file format described above using such a Box structure is called a MovieBox (hereinafter “moov”). Similarly, the media data portion is called a MediaDataBox (hereinafter “mdat”).
In addition, the MP4 file format Box defined in ISO/IEC 14496-14 consists not only of required boxes but also of optional boxes that maybe used as necessary, or boxes that are freely defined by the user. These include, for example, a FileTypeBox (hereinafter “ftyp”). The ftyp must be at the head of the MP4 file. FIG. 2 shows an example of an MP4 Box structure using ftyp. Although the ftyp 202 must be placed at the head of the file, the remaining Boxes, such as the moov 203 and the mdat 204, maybe placed in any order.
In addition, furthermore, a detailed description of moov and mdat is now given, using FIG. 6. For simplicity, the description begins with an mdat 620. The mdat 620 is composed of a sequence of data units called chunks. In the example shown in FIG. 6, the mdat 620 has a structure in which audio chunks and video chunks are arranged in alternating sequence, that is, Audio chunk 1 (621), Video chunk 1 (622), Audio chunk 2, Video chunk 2, . . . and so forth.
The order of the chunks as well as the number of chunks that form the mdat are arbitrary. In addition, one chunk is composed of a sequence of data units corresponding to 1 frame of video or audio data called a sample. Using the example of a Video chunk 1 (622) shown in FIG. 6, the Video chunk 1 (622) is composed of a sequence of several vide samples, that is, Video sample 1 (623), a Video sample 2 (624), a Video sample 3, . . . and so forth.
Next, a description is given of the structure of the moov 601. The moov 601 is further layered into Boxes, with a required Box in the form of a MovieHeaderBox (mvhd 602) that contains header information as a whole, and a plurality of TrackBoxes such as a trak(Audio) 603 and a trak(Video) 604 used as examples in FIG. 6. These TrackBoxes are further layered into Boxes. A description of the structure of the lower layer is given using trak(Video) 604 as an example.
Descending to the lower layer of the layers in the trak(Video) 604 shown in FIG. 4, it can be seen that there is a SampleTableBox(stbl) 605. Data of at least one of the chunks and the samples of the mdat 620 is contained in this Box, linked to each item. To describe simply the items in the stbl 605, it is to be observed that stts 606 is the duration of the sample, stsd 607 is sample details, stsz 608 is sample size, stsc 609 is the number of samples included in a chunk, that is, the number of frames, and stco 610 is a chunk offset, each linked with the samples and stored.
With such structure and data, it is possible to reproduce an MP4 file while manipulating actual media data of the mdat.
Examining what happens when creating an MP4 file, it can be seen that a variety of information is created that is stored in the moov, such as offset values and the like, while creating the mdat that is actual encoded data. As a result, conventionally, as shown in FIG. 3, when creating an MP4 file from data encoded in an audio/video encoding process 301 with an MP4 file generation process 310, the following method is used: First, in an mdat generation process 311, the mdat is written at the head of the file. Then, from that mdat information, the moov is generated in a moov generation process 312 in a memory or in a temporary file. Finally, in a mdat/moov file generation process 313, the moov generated in the memory of the temporary file is written behind the mdat. This method is thought to minimize the required memory and make rapid file creation possible.
However, what the user values most in contents reproduction is an adequately short waiting time from contents request to the start of contents display. In order to satisfy this requirement, the moov which is the contents metadata must be present at the head of the file, and moreover, its size must not be too large.
Conventionally, when generating an MP4 file in which the moov is at the head of the file and the mdat comes after the moov as described above, a method like that shown in FIG. 4 is employed. In other words, when generating an MP4 file by an MP4 file generation process 410 from the encoded data encoded by the audio/video encoding process 301, first, the mdat is written to a temporary file in the mdat generation process 311. Then, in the moov generation process 312 the moov is generated in a memory or in a temporary file. Finally, in a moov/mdat file generation process 413 a new MP4 file is composed in the order of moov, mdat.
This technique, however, is redundant, because once both the moov and the mdat are generated in the memory or the temporary file, they are then written to the final MP4 file (copied).
In addition, the definition of the move portion sometimes differs between applications depending on the optional Boxes and the user-defined Boxes. In that case, depending on the application, it may be impossible to reproduce a file created on another application. It is possible to solve this problem of compatibility between applications by converting only the moov portion.
A technique for efficiently generating MP4 files beginning with moov and maintaining compatibility between applications has been proposed in for example Japanese Laid-Open Patent Publication No. 2003-173625, involving providing the ability to store vendor candidates that are expected to be used in conversion and their attendant MP4 file format metadata information in the apparatus in advance, and then from that information reserving a metadata size that includes free data and generating an MP4 file.
A description is given of the outlines of the technique proposed in Japanese Laid-Open Patent Publication No. 2003-173625 using FIG. 5. Specifically, when the MP4 file is generated by an MP4 file generation process 510 from the encoded data encoded by the audio/video encoding process 301, first the mdat is created in a memory or in a temporary file in an mdat generation process 511. Then, the moov is generated in the memory or in the temporary file from that mdat information in a moov generation process 512.
Next, the sum of a free space which may be required with the addition of file conversion time to the moov size obtained with the moov generation process 512 in order to maintain compatibility is calculated with a moov+free space calculation process 513. Finally, in a moov+free/mdat file generation process 514, the moov is written in at the head of the MP4 file and the mdat is written in after the moov+free space (a free space is left open after the moov and the mdat written in). Thus is the MP4 file generated.
In the conventional art proposed in Japanese Laid-Open Patent Publication No. 2003-173625, free space is reserved for MP4 file format conversion, making it possible to shorten file conversion processing time and to delete temporary space such as the temporary copies required for conversion processing. However, if file format conversion is not required, the free space remains reserved as is without being used, thus increasing file size unnecessarily and wasting storage device capacity when storing.
Although in the conventional art described above efficient file conversion with little memory is possible, when analyzed in terms of MP4 file storage, in the case of small mobile equipment such as mobile-phones and PDAs (Personal Digital Assistants), the storage capacity is at most several tens to several hundreds of MB, which is quite limited compared to devices such as a personal computer (PC) having a large-scale storage capacity of several tens to over 100 GB. Therefore, particularly in devices with little storage capacity, it is desirable that files be stored efficiently in limited storage space.
In addition, when analyzed in terms of file creation time, in an image pickup apparatus such as a digital video recorder or a digital camera, or a mobile-phone or a PDA equipped with an image pickup function, it can sometimes take a long time to complete the creation of an image pickup or an edited multimedia file. In this type of waiting time (that is, the time required for file creation), as described using FIG. 3 and FIG. 4 of the background art, a delay attendant upon the copying of the data carried out during file creation and the order of processing in file editing is included. Therefore, reducing file creation time so that the user can move quickly to the next action after moving picture pickup is completed is also desirable.