In recent years, processing capability of mobile phones, PDAs, personal computers (PC), and various other types of computing devices has significantly improved, enabling such devices to readily handle multimedia data such as video and audio.
Techniques for compression-encoding such video and audio have also improved and known techniques include MPEG-2 and MPEG-4 methods that are standardized by ISO (International Organization for Standardization) as international standards. In those MPEG compression encoding methods, images are constructed from pictures of three types: I, P, and B (VOP), depending on which predictive coding schemes is applied. An I picture is an intra-frame encoded image, a P picture is an inter-frame forward predictive-coded image that uses a past image, and a B picture is a bi-directional predictive-coded image that uses both a past image and a future image.
Among these three types of pictures, only I pictures can be decoded without other pictures. Thus, operation such as special reproduction (e.g. random access and fast-forward) that disturbs the order of frames is realized by accessing an I picture that serves as the reference of decoding.
However, since access to an I picture during random access is a complicated process, it is desirable to make decoding processing more efficient and reduce delay. Thus, a technique has been proposed that reliably obtains I picture data and implement special reproduction smoothly in a recording/reproducing apparatus for image data that is compressed and encoded in MPEG-2 (see Japanese Patent Laid-Opens No. 11-261964 and No. 2000-224543, for example).
In addition, ISO has standardized ISO Base Media File Format (hereinafter referred to just “ISO file format”) as a basic format for a flexile and extensible media file format that facilitate exchange, management, and edit of media (ISO/IEC 14496-12).
On the basis of the ISO file format, file formats that are extended for recording data of a certain encoding format have been also standardized, including MP4 file format for recording MPEG-4 video and audio coded data (ISO/IEC 14496-14), Motion JPEG 2000 file format (ISO/IEC 15444-3), AVC file format (ISO/IEC 14496-15), 3GPP file format that is a standardization project for the third-generation mobile communication system (3rd Generation Partnership Project), and 3GPP2 file format (3rd Generation Partnership Project 2) that is a project derived from 3GPP.
The ISO file format (and the above-described file formats that are based on the same) consists of a nested data structure called “Box” as shown below:
class Box {unsigned int(32) size;byte type[4];byte data[ ];};    The “size” is a field that indicates the data length of Box.    The “type” is a field indicating the type of Box, for which an identifier of four characters (four bytes) assigned to each Box is set.    The “data” is substance of data included by Box, having different contents depending on “type”. As Box can have a hierarchical structure and include one or more Boxes, “data” may be another Box.
FIG. 1 shows listing of Boxes that are defined in the ISO file format.
In the figure, the leftmost to the sixth columns indicate the type of Boxes that are defined in the ISO file format and inclusion relationship among them. The rightmost column provides description of Box in a corresponding row.
As mentioned above, each Box is assigned four-byte “type” for identifying its type, which is represented by four alphanumeric characters. This four-character “type” is given in the six left columns. In the following discussion, this “type” will be used to refer to a particular Box.
A relationship in which one Box includes another Box is indicated by a positional relationship of columns. The table means that Box in one column includes Box that is indicated in a column positioned right to the column. For example, “moov” shown in the second row from the top is meant to include “mvhd”, “trak”, and “mvex”.
FIG. 2 illustrates a simplified data structure of the ISO file format. As shown, Box structure of the ISO file format typically consists of moov 201 for storing meta data for media data such as management information for video and audio samples and mdat 202 that stores actual media data such as sample data of video and audio.
Moov 201 includes Box that is called “trak” 203 corresponding to respective media data. Also, media data in mdat 202 is stored as separated into sets of multiple continuous samples of video and audio that are called “chunk” and an array of chunk is stored in data field of mdat 202. The size of chunk and the number of samples contained in it are not specifically limited and chunk having any size and any number of samples can be constructed and stored in accordance with environment and context.
For such ISO file format, a data structure has been defined for identifying a random access point as information for random access (special reproduction) processing.
Random access is performed mainly by using child Box included in stbl 204.
Stts (time-to-sample) 205 indicates to which time a track sample corresponds. This Box is used to find the first sample prior to a given time. Although stts 205 has been referred to as an example here, Box called ctts in FIG. 1 can be also used to find the first sample prior to a given time as well.
However, since a sample found may not be a random access point, subsequent Boxes need be further checked to find the closest random access point.
Stts (sync sample table) 206 indicates which sample is an actual random access point. By using this Box, the first sample number prior to a given time can be searched for. In absence of stts 206, random access would be easy because it means that all the samples are random access points.
At this point, a sample number that should be used for random access has been found. Next, stsc (sample-to-chunk) 207 is used in order to determine in which chunk the sample number is positioned.
Further, stco (chunk offset) 208 is used for determining where the chunk starts. While stco has been referred to as an example, Box called co64 (64 bit chunk offset) shown in FIG. 1 may also be used to obtain the offset position of a chunk as well.
Starting with this offset, the size of each of the samples that exist up to the location at which the target sample is positioned within that chunk is obtained from stsz (sample size) 209 and the total sum of the samples is calculated. The target chunk offset position plus the overall size will be the position at which the target sample starts.
The start position and size of the sample thus obtained can be used to identify sample data that should be used for random access from mdat 202.
However, the technique proposed by the Japanese Patent Laid-Open No. 11-261964 is a technique for realizing increased efficiency of high-speed reproduction using a management file that is recorded separately from multiplexed video and audio data. Thus, the technique has a problem of inability to attain efficient high-speed playback if correlation between contents and its corresponding management file is broken due to copying of contents data, for example.
Also, the technique proposed by the Japanese Patent Laid-Open No. 2000-224543 is a technique for realizing efficient special reproduction of MPEG-2 image data in a recording/reproducing apparatus. The patent does not mention how to efficiently perform processing of ISO Base Media File Format in special reproduction.
Processing using a data structure that identifies a random access point as outlined above may allow access to desired data for special reproduction data having ISO file format. However, the processing is time consuming because the structure and correlation of a header is complicated.