1. Field of the Invention
The present invention relates to a playback method and apparatus, a program, and a recording medium for decode-processing and playing back coded audio data which is transmitted with stereo process information intermittently multiplexed into coded information of a monaural audio signal.
2. Description of Related Art
Playback apparatuses are known which are supplied with a monaural audio signal and stereo process information, and which generate stereo audio signals by stereo processing the monaural audio signal on the basis of the stereo process information.
A typical stereo process such as above which is based on a monaural audio signal and stereo process information will now be described with reference to the drawings. FIG. 6 is a block diagram showing a configuration example of a typical stereo process apparatus, and FIG. 7 is a diagram showing an example of a signal to be supplied to the stereo process apparatus of FIG. 6. The stereo process information may be transmitted as multiplexed.
In FIG. 6, a monaural audio signal is supplied to an input terminal 41, and stereo process information is supplied to an input terminal 42. The monaural audio signal from the input terminal 41 is delivered to a band divider 44 via a selector switch 43 to be band-divided, and resultant band-divided monaural audio signals are delivered to a stereo processor 45. The stereo processor 45 is supplied with the stereo process information from the input terminal 42, and stereo-processes the band-divided monaural audio signals into left-channel (Lch) and right-channel (Rch) stereo signals. The Lch, Rch stereo signals are delivered to an Lch band synthesizer 51 and an Rch band synthesizer 52, respectively. An Lch audio signal from the band synthesizer 51 is delivered to a selector switch 53, where one of this Lch audio signal and a signal supplied from the selector switch 43 via a delay section 46 is selected, and the selected signal is delivered to a selector switch 54 and an output terminal 55. An Rch audio signal from the band synthesizer 52 is delivered to the selector switch 54, where one of this Rch audio signal and the signal from the selector switch 53 is selected, and the selected signal is delivered to an output terminal 56.
FIG. 7 shows an example of a signal to be supplied to the stereo process apparatus of FIG. 6. The signal is numbered #0, #1, #2, in transmission units of coded audio data, such as in units of frames or blocks. In the figure, M denotes a monaural audio signal, and S denotes stereo process information. In the example of FIG. 7, the monaural audio signal M is always transmitted, whereas the stereo process information S is transmitted as multiplexed and at a rate of one every five times. In this case, stereo process information S delivered as contained in a transmission unit #0 is used for a stereo process during a period corresponding to transmission units #0 to #4, and then switched to next stereo process information S at a timing corresponding to a transmission unit #5. This stereo process information S delivered at the timing corresponding to the transmission unit #5 is used during a period corresponding to transmission units #5 to #9. Thereafter, previously delivered stereo process information S is similarly used until next stereo process information S is delivered.
In the configuration of FIG. 6, when stereo process information is supplied, the selector switches 43, 53, 54 are switched to selectable terminals B. Namely, the monaural audio signal supplied from the input terminal 41 is band-divided by the band divider 44, and the stereo signals are generated by the stereo processor 45 on the basis of the stereo process information. The generated stereo signals are band-synthesized by the band synthesizers 51, 52 of the respective channels, and then outputted as the Lch, Rch stereo audio signals from the output terminals 55, 56, respectively.
Meanwhile, in a discontinuous frame playback, such as a fast-forwarding playback based on a playback by decimating frames (transmission units), or in a playback from an arbitrary frame, multiplexed coded information may drop out in some cases. When coded audio data is supplied from an arbitrary frame (transmission unit) due to such a discontinuous frame playback or the like, the absence of usable stereo process information may occur. For example, when the input starts at a position corresponding to the transmission unit #2 of FIG. 7, the stereo process information S contained in the transmission unit #0 is absent due to frame decimation or the like, so that there is no usable stereo process information during a period corresponding to the transmission units #2 to #4.
In the apparatus of FIG. 6, in order to prevent the number of channels of its output audio signals from being changed due to the stereo process information being present or absent, it is arranged to output the monaural audio signal to both the stereo left and right channels, even in the absence of usable stereo process information (e.g., during the period corresponding to the transmission units #2 to #4 of FIG. 7). Specifically, by switching the selector switches 43, 53, 54 to selectable terminals A, the apparatus outputs identical monaural audio signals from the output terminals 55, 56, respectively. Here, when the selector switch 43 is switched to its selectable terminal A, the monaural audio signal from the input terminal 41 is delivered to the delay section 46. This is to give the supplied monaural audio signal a delay that occurs at the band divider 44, in view of a fact that the band divider 44 holds a state variable as in, e.g., a FIR filtering process, and updates the state variable and causes a delay every time it performs the process. Since the band synthesizers and the like perform their band synthesis in a manner causing no delay, the delay section 46 takes care of only the delays at the band divider 44. The monaural audio signal from the delay section 46 is outputted from the Lch output terminal 55 via the selector switch 53, and also outputted from the Rch output terminal 56 via the selector switch 54. It is noted that internal state variables of the band divider 44 and the like are initialized when there is no usable stereo process information such as in the period corresponding to the transmission units #2 to #4 of FIG. 7.
Accordingly, if the data is supplied at the position corresponding to the transmission unit #2 of FIG. 7, in the stereo process apparatus of FIG. 6, the internal state variables are initialized, and also the selector switches 43, 53, 54 are switched to their selectable terminals A during the period corresponding to the above-mentioned transmission units #2 to #4. Then, upon input of the data at the position corresponding to the transmission unit #5, the selector switches 43, 53, 54 are switched to selectable terminals B, and also the internal state variables are updated. It is noted that switching operations of the selector switches 43, 53, 54, and processing operations of the relevant sections are controlled by a control section, not shown, in accordance with the content of input data, internal states, or the like.
Here, a specific example of a coding system will be described below, by which part of coding information for the stereo process and the like is multiplexed into a monaural audio signal to be transmitted.
Audio data coded by, e.g., an HE AAC (High Efficiency Advanced Audio Coding, International Standard ISO/IEC 14496-3) coding system, particularly, an HE AAC v2 (version 2) coding system, is transmitted with part of coded information required for decoding, multiplexed thereinto. This HE AAC v2 coding system is configured by combining three technologies, i.e., an advanced audio coding (AAC) process, a spectral band replication (SBR) process, and a parametric stereo (PS) process. Coded information for the SBR process and the PS process is transmitted as partially multiplexed.
The AAC process is a coding process in an audio compression algorithm standardized by MPEG (Moving Picture Experts Group) audio. The SBR process is a coding process for band extension by dividing an input signal into a plurality of subbands, and replicating high sound frequency bands from lower frequency bands thereof. The PS process is a coding process for spatial coding using spatial information and the like required for generating stereo signals from a monaural signal.
Coded audio data which is coded by the above-mentioned HE AAC v2 system includes AAC core coded information equivalent to a monaural audio data coded by the above-mentioned AAC coding system, the coded information for the above-mentioned SBR process, and the coded information for the above-mentioned PS process. The coded information for the SBR process includes coded information (sbr header) which is multiplexed and intermittently transmitted, and coded information (sbr data) which is always transmitted. For decoding the sbr data (SBR data), the sbr header (SBR header) is required. As to the sbr header (SBR header), its content can be changed under a specific rule, and also its transmission timing is subject to an operational practice. The coded information (ps data) for the PS process is transmitted as contained in an extended area of the sbr data (SBR data). Thus, for decoding the ps data (PS data), the sbr header (SBR header) information is likewise required. Namely, the sbr header (SBR header) is necessary stereo process information required for acquiring the ps data (PS data) for the stereo process. FIG. 8 shows an example of audio data which is coded by the HE AAC v2 coding system. In FIG. 8, AC denotes the AAC core coded information, SH denotes the above-mentioned sbr header (SBR header), and SD denotes the above-mentioned sbr data (SBR data).
As shown in FIG. 8, for decoding SBR data SD and PS data contained in its extended area, an SBR header SH which is intermittently transmitted is required. However, in a playback from an arbitrary frame such as mentioned above, the SBR header SH which is multiplexed may drop out in some cases. Here, unless multiplexed frames are particularly monitored constantly by a higher-level system or the like, a decoding process using the AAC core coded information AC is performed to generate output audio signals until a frame from which the multiplexed SBR header SH can be acquired arrives. The decoding process in this case includes the above-mentioned AAC decoding process, and an up-sampling process based on the above-mentioned SBR process for band division and band synthesis.
Upon arrival of a frame containing multiplexed SBR header SH, the above-mentioned SBR data SD and the PS data contained in its extended area are decoded using this SBR header SH. Then, a “complete” decoding process (including the stereo process) using these SBR data and PS data is performed to generate output stereo audio signals. In the decoding process for the above-mentioned HE AAC v2 coded audio data, the above-mentioned AAC decoding process is performed, and then in the above-mentioned SBR process, band division and generation of high frequency (HF) components are performed, after which stereo signals are generated from the band-divided monaural signals on the basis of spatial information coded in the above-mentioned PS process, and finally output stereo audio signals are generated by a band synthesis process in the SBR process.
FIG. 9 is a block diagram showing a configuration example of a playback apparatus for coded audio data which is coded by the above-mentioned HE AAC v2 system. A coded audio stream is supplied, by transmission, to an input terminal 11 of FIG. 9. The coded audio stream contains the AAC core coded information, the HF generation coded information (SBR data), and the PS coded information (PS data). Part of the coded information is transmitted as multiplexed. For decoding the HF generation coded information (SBR data) and the PS coded information (PS data), an SBR header SH which is transmitted as multiplexed is required, as mentioned above.
In the HE AAC v2 coding system, when part of the SBR header SH differs from that contained in a previous frame, an initialization for the SBR process needs to be performed. By the initialization for the SBR process, state variables (delay signals) in QMF analyzers/synthesizers, a hybrid analyzer, and the like, later-described, are initialized. A state variable (delay signal) herein used is intended to mean data (signal) held at a delay element within a filter. In a filtering process, a delay occurs within a period from the input to the output of a signal in accordance with a filtering length, and the state variable means this delay signal.
By the way, monaural audio data acquired by decoding the AAC coded information which is coded by the HE AAC v2 coding system is up-sampled by carrying out QMF analysis and QMF synthesis in the SBR process. For example, the apparatus SBR-processes the monaural audio data after the AAC decoding, at a sampling rate of 24 kHz, whereby the apparatus outputs audio data whose sampling rate is 48 kHz.
In FIG. 9, the coded audio data from the input terminal 11 is delivered to a payload deformatter 12 to be separated into AAC core coded information to an AAC core decoder 13, and into HF generation coded information (SBR data)/PS coded information (PS data). The AAC core decoder 13 decodes the supplied AAC core coded information, generates an AAC core monaural signal, and delivers the generated signal to an SBR processor 20. A parser 14 of the SBR processor 20 acquires multiplexed information such as the HF generation coded information and the like from the payload deformatter 12, checks their content, judges whether or not an initialization for the SBR process is needed. If the initialization is needed, the parser 14 outputs an initialization control signal from a terminal 14t, so that an initialization for the SBR process will be performed on relevant sections, as described later. The monaural audio signal delivered to the SBR processor 20 from the AAC core decoder 13 is band-divided by a QMF analyzer 21, and resultant band-divided signals are delivered to a selector switch 22. If the HF generation coded information (SBR data) is supplied, the selector switch 22 is switched for connection to a selectable terminal B, C, so that the signals from the QMF analyzer 21 are delivered to an HF generator 23. The HF generator 23 generates HF signals. An envelope adjuster 24 makes an envelope adjustment. Resultant signals are delivered to a selector switch 25.
If stereo process information is acquired from the above-mentioned PS coded information (PS data), the selector switches 22, 25 are switched for connection to selectable terminals C. Signals from the selectable terminal C of the selector switch 25 are delivered to a hybrid analyzer 27. The hybrid analyzer 27 further band-divides low frequency (LF) signals of the supplied band-divided signals, and supplies resultant signals to a signal de-correlator 29 and a stereo processor 30. The signal de-correlator 29 de-correlates the supplied signals, makes an acoustic adjustment thereon, and supplies resultant signals to the stereo processor 30. The stereo processor 30 generates Lch, Rch stereo signals from the supplied band-divided signals and stereo process information. For the generated Lch, Rch stereo signals, hybrid synthesizers 31, 32 of the respective channels band-synthesize the band-divided signals obtained by the above-mentioned hybrid analyzer 27, and further, QMF synthesizers 33, 34 band-synthesize the band-divided signals obtained by the above-mentioned QMF analyzer 21, to generate Lch, Rch stereo output audio signals. The Lch audio signal from the QMF synthesizer 33 is delivered to a selector switch 36 and an output terminal 37. The Rch audio signal from the QMF synthesizer 34 is delivered to the selector switch 36, where one of this Rch audio signal and the signal from the QMF synthesizer 33 is selected, and the selected signal is delivered to an output terminal 38.
If multiplexed information such as the above-mentioned stereo process information is not transmitted, the selector switches 22, 25, 35, 36 of FIG. 9 are switched for connection to either the selectable terminals A or B. In order to keep a fixed sampling frequency for the output audio signals, only up-sampling is performed using the QMF analyzer 21 and the QMF synthesizer 33. Additionally, in order to keep a fixed number of output channels, the Lch audio signal is copied for the Rch audio signal to generate the output signals.
FIG. 10 is a flowchart for illustrating a decoding operation such as mentioned above, e.g., in the configuration of the above-mentioned FIG. 9.
In FIG. 10, on coded information such as the coded audio stream to be supplied to the above-mentioned input terminal 11, a decoding (deformatting) process for data coded by the above-mentioned HE AAC v2 system is performed in step S101 to extract HF generation coded information and spatial coded information such as mentioned above, as multiplexed coded information. Further, on the above-mentioned AAC core information, an AAC signal process is performed in step S102. In the following step S103, it is judged whether or not the above-mentioned SBR process is to be performed, and if YES, the process proceeds to step S104, whereas if NO, the process proceeds to step S114. These processes correspond to, e.g., the processing performed by the payload deformatter 12 and the AAC core decoder 13 of FIG. 9.
In step S104, a QMF band division process is performed by, e.g., the above-mentioned QMF analyzer 21. In the following step S105, it is judged whether or not the multiplexed coded information is already decoded, and if YES, the process proceeds to step S106, whereas if NO, the process proceeds to step S113. In step S106, an HF signal generation process is performed using the multiplexed HF generation coded information (already decoded information) by, e.g., the above-mentioned HF generator 23, and then, in the following step S107, it is judged whether or not the PS process is to be performed.
If it is judged YES (the PS process is to be performed) in step S107, control proceeds to step S108, where a hybrid analysis process is performed. Then, in step S109, a stereo signal generation process based on the spatial information is performed, and further in step S110, a hybrid synthesis process is performed. Thereafter, control proceeds to step S111. These processes correspond to, e.g., processing extending from the processing performed by the hybrid analyzer 27 to the processing performed by the hybrid synthesizers 31, 32 of FIG. 9. If it is judged NO (the PS process is not to be performed) in step S107, control proceeds to step S111.
In step S111, an Lch QMF band synthesis process is performed, and in step S112, an Rch QMF band synthesis process is performed, and resultant audio signals are outputted. Furthermore, in the above-mentioned step S113, the Lch QMF band synthesis process is performed, and in step S114, the monaural signal is replicated, as necessary, to generate stereo signals, and resultant audio signals are outputted. These processes correspond to, e.g., the processing performed by the QMF synthesizers 33, 34 via the selector switches 22, 35, 36 of the above-mentioned FIG. 9.
As related-art technologies, Published translation of International Patent Application (KOHYO) No. 2004-535145 (Patent Reference 1) and Japanese Patent Application Publication (KOKAI) No. JP 2006-085183 (Patent Reference 2) disclose a technology for generating stereo audio signals by stereo-processing a monaural audio signal on the basis of stereo process information, and ISO/IEC 14496-3: 2005, Information technology—Coding of audio-visual objects, —Part 3: Audio (Non-patent Reference 1) discloses a standard of the above-mentioned HE AAC (High Efficiency Advanced Audio Coding) coding system.