1. Field of Invention
The present invention relates to a data reproduction device which demultiplexes data such as video and audio multiplexed in a bitstream, and decodes and reproduces such data.
2. Description of the Related Art
In recent years, with the increase in capacity of storage media and communication networks and the advance of data transmission technology, devices and services involving coded multimedia data, such as video and audio, have come into wide use.
For example, in the broadcasting sector, broadcasting of digitally coded media data has replaced conventional analog broadcasting. Although the current digital broadcasting is directed only to landline receivers, broadcasting for mobile devices such as cellular phones is scheduled to commence. In the communication sector, for example, video distribution services for third generation cellular phones have started, and an environment for handling multimedia data has been created not only on landline terminals but also mobile terminals. Accordingly, it is expected that multimedia will be used increasingly in various manners, in which, for example, content data received via broadcasting or the Internet is recorded in a memory card such as a secure digital (SD) card or an optical disk such as a digital versatile disk-rewritable (DVD-RAM) and shared between devices.
Here, the Advanced Audio Coding (AAC) standard developed by the Moving Picture Expert Group (MPEG) is taken as a typical example of audio data coding format, which is widely used in digital broadcasting, video distribution services for the third generation cellular phones, and the like.
Generally, in coding of audio data, the upper limit of the frequency band for reproduction is lowered as the compression ratio increases, and thus the sound quality degrades accordingly. This is because not enough bits are allocated to coding of high frequency components. So, in order to recover the missing high frequency components, a technique called Spectral Band Replication (SBR) for generating high frequency components through artificial extension of bandwidth has been developed. To be more specific, by performing bandwidth extension processing on coded data, using supplementary information stored in a stream for estimating high frequency components from low frequency components, it becomes possible to reproduce high quality sound from such coded data even if it is compressed at a higher ratio and thus at a lower bitrate. Here, assuming that AAC coded data included in data of one frame is called basic data, frame data is made up of such basic data and SBR data. With the SBR tool, double the bandwidth of the basic data can typically be reconstructed, and therefore, for example, output data of 32 kHz can be obtained from basic data of 16 kHz. Note that a coding format enhanced by adding a SBR function to the conventional AAC is called AAC-plus. Here, an AAC-plus frame, which does not include SBR data, is decoded as data in AAC format. Since AAC-plus is compatible with AAC, a decoding unit for AAC-plus can decode coded data in AAC format. A decoding unit for AAC can also decode only basic data by skipping the reading of SBR data in AAC-plus. In the following description, AAC-plus denotes a coding format including both MPEG-2 and MPEG-4 in a comprehensive manner, while MPEG-2 AAC and MPEG-4 AAC denote separate coding formats.
As described above, since AAC-plus is particularly effective at a lower bitrate, it is expected to be expanded to services for mobile devices. For example, it is to be used for third generation mobile terminals, digital terrestrial broadcasting for mobile devices, or the like. Note that MPEG-2 AAC is used in digital terrestrial broadcasting for mobile devices. FIG. 1 is a diagram showing an overview of digital terrestrial broadcasting for mobile devices. Audio data and video data multiplexed in a transport stream (TS) in MPEG-2 format are transmitted from a broadcast station. TS is a stream of fixed length packets of 188 bytes each, called TS packets, and a cellular phone, an in-vehicle terminal or the like receives these TS packets. Here, in a TS, a data unit called a section, which stores TV show information, is transmitted along with audio data and video data, while the reception side analyzes the TV show information in the section and then starts receiving the TS packets storing the audio data and video data. A section showing TV show information is called a program map table (PMT).
When carrying coded data in AAC or AAC-plus format via a TS packet, the frames of the coded data are carried after being converted to audio data transport stream (ADTS) frames in MPEG-2 format. FIG. 2 shows a data structure of an ADTS frame. The header of an ADTS frame stores information such as a sampling frequency, the number of channels, and the like of audio data stored in the payload, and the payload of the ADTS frame stores data of one frame in AAC or AAC-plus format. In the case of AAC-plus, since the sampling frequency stored in the ADTS header indicates the sampling frequency of basic data, the sampling frequency of bandwidth-extended data cannot be obtained from the ADTS header.
Next, recording of digital terrestrial broadcasts for mobile devices received on a mobile terminal is described. With the commencement of digital broadcasting for mobile terminals, broadcasts are supposed to be recorded. An MP4 file format (hereinafter referred to as MP4) is expected to be used as a multiplexing format for recording them, from a standpoint of ensuring interconnectability with the third generation mobile terminals. Here, MP4 is a file format standardized by ISO/IEC JTC1/SC29/WG 11, and is adopted in Transparent end-to-end packet switched streaming service (TS26.234) defined, as a wireless video distribution standard, by the Third Generation Partnership Project (3GPP), which is an international standardization organization aimed at standardization of a third generation mobile communications system. In the 3GPP standard, MPEG-4 AAC is used as AAC. Since MPEG-4 AAC has backward compatibility with MPEG-2 AAC, a terminal which is compliant with MPEG-4 AAC can correctly decode and reproduce MPEG-2 AAC coded data. Even a terminal which is compliant only with MPEG-2 AAC can also correctly decode and reproduce MPEG-4 AAC coded data if the data is coded without using a function specific to MPEG-4 AAC.
Description is given below regarding a method for multiplexing AU data in MP4. Here, AU is equivalent to one picture in a video sequence or one frame in an audio sequence. In MP4, media data is handled in units of samples. One sample is equivalent to one AU, and sample numbers, which are incremented one-by-one in decoding time order, are assigned to respective samples. Furthermore, header information and media data per sample is managed in units of objects called Boxes. FIG. 3A shows a structure of a Box made up of the following fields:
(1) Size: total size of a Box including a size field; (2) type: identifier of a Box and typically represented by four alphabetical letters (a field length is 4 bytes, and a Box in an MP4 file is searched while judging whether or not data of consecutive 4 bytes matches the identifier stored in the type field); (3) version: version number of a Box; (4) flags: flag information set for each Box; and (5) data: header information and media data are stored therein.[0010] Note that since “version” and “flags” are not mandatory fields, some Boxes do not contain these fields. Identifiers of type fields are used in referring to Boxes in the following description. For example, the Box whose type is “moov” is called “moov”. The Box structure in the MP4 file is shown in FIG. 3B. The MP4 file is composed of “fytp”, “moov” and, “mdat” or “moof”, and “fytp” is positioned at the beginning of the MP4 file. Information for identifying an MP4 file is included in “fytp”, and media data is stored in “mdat”. Each media data included in “mdat” is called a track, and each track is identified by a track ID. Next, header information on a sample included in each track of “mdat” is stored in “moov”. In “moov”, as shown as FIG. 4A, Boxes are hierarchically placed, and header information for audio media data and header information for video media data are separately stored in respective “trak” fields. In a “trak”, Boxes are also hierarchically placed, and the following information is stored in each Box in “stbl”: size, decoding time and display starting time of each sample; or information on each randomly-accessible sample (FIG. 4B). Such randomly-accessible samples are called Sync samples, and a list of sample numbers of the Sync samples is shown by “stss” in “stbl”. The header information of all the samples in a track is stored in “moov” in the above description, but it is possible to divide this track into fragments and store the header information on a fragment-by-fragment basis. The header information on each unit obtained by dividing the track is shown in “moof”. In the example of a fragmented MP4 file in FIG. 5, the header information of samples to be stored in “mdat#1” can be stored in “moof#1”.
FIG. 6 is a diagram showing a structure example of a conventional MP4 file in which broadcast data is recorded. Received AAC data is recorded in a conventional MP4 file, as MPEG-2 AAC data. Therefore, identification information indicating that the audio track in the MP4 file for recording data is in MPEG-2 AAC format is stored in “moov”. In addition, since AAC coded data is different from MPEG-4 AAC data, the type of the coded data stored in the MP4 file does not comply with the 3GPP standard. Furthermore, there is no identification information indicating whether the SBR function is valid or not in the header of the MP4 file storing MPEG-2 AAC data, and only the frequency of the basic data in AAC-plus format is indicated there.
In addition, since a conventional brand defined for each operational standard such as SD is used, it is not possible to judge from the brand stored in “ftyp” whether or not digital terrestrial broadcast data is recorded in the MP4 file.
FIG. 7 is a block diagram showing a configuration of a conventional data reproduction device 1000 which reproduces a conventional MP4 file. The data reproduction device 1000 includes a header separation unit 1001, an input frequency obtainment unit 1002, a decoding unit 1003 and an output unit 1004, and demultiplexes coded audio data and coded video data from an input MP4 file, decodes them, and reproduces them (see, for example, Patent Document 1). A description is given about operations for AAC reproduction, and a description about operations for video reproduction is omitted. Note that the audio coding format in the present invention is not limited to AAC or AAC-plus, and it may be AC3, MP3, or any other format having a bandwidth extension function additionally to such coding format.
The header separation unit 1001 separates the header from the MP4 file, outputs, to the input frequency obtainment unit 1002, the header information Hdr including at least information indicating an audio sampling frequency, and outputs the sample data separated from “mdat” to the decoding unit 1003. Here, in AAC-plus, the frequency of the basic data is indicated as a sampling frequency. The input frequency obtainment unit 1002 analyzes the header information Hdr, obtains the input frequency FSin that is the frequency of the basic data, and outputs it to the decoding unit 1003. The decoding unit 1003 decodes the sample data Sp1Dat based on the input frequency FSin, and outputs, to the output unit 1004, the decoded frame Fdata which is the decoding result and the output frequency FSo which is the sampling frequency of the decoded frame Fdata. The output unit 1004 outputs the decoded frame Fdata in accordance with the output frequency FSo.    Patent Document 1: Japanese Laid-Open Patent Application No. 2003-114845.