1. Field of the Invention
The present invention relates to an audio decoding apparatus used in AV (audio visual) equipment for decoding an encoded bit stream into PCM data. The present invention also relates to a signal processing device, a sound image localization device, a sound image control method, an audio signal processing device, and an audio signal high-rate reproduction method also used in AV equipment.
2. Description of the Related Art
A conventional audio decoding apparatus 550 will be described with reference to FIGS. 6, 7 and 8. FIG. 6 is a block diagram illustrating a structure of the conventional audio decoding apparatus 550. The audio decoding apparatus 550 includes an integrated semiconductor device 508. The integrated semiconductor device 508 includes an input bit stream syntax analyzer 501, an exponential section decoder 502, a mantissa data bit allocator 503, a mantissa section decoder 504, an IMDCT 505, a down-mix operator 506, and an internal memory device 507. The integrated semiconductor device 508 exchanges data with an external memory device 500.
A bit stream is first stored in the external memory device 500 and then input to the input bit stream syntax analyzer 501. The input bit stream syntax analyzer 501 analyzes the syntax of the bit stream and extracts data required for decoding. Such data is sent to the exponential section decoder 502. The exponential section decoder 502 forms exponential data for a frequency domain from the data required for decoding, and output the exponential data to the mantissa data bit allocator 503 and the IMDCT 505. The mantissa data bit allocator 503 calculates a mantissa data bit allocation amount from the exponential data for the frequency domain and the data stored in the external memory device 500, and outputs the mantissa data bit allocation amount to the mantissa section decoder 504. The mantissa section decoder 504 forms mantissa data for the frequency domain from the mantissa data bit allocation amount and outputs the mantissa data to the IMDCT (inverted modified discrete cosine transformer) 505. The IMDCT 505 forms decoded audio data in a time domain from the exponential data and the mantissa data for the frequency domain, and stores the decoded audio data in the external memory device 500. The down-mix operator 506 forms PCM data from the decoded audio data stored in the external memory device 500, performs interleaving and then stores n the resultant data in the external memory device 500. The PCM data is then output from the external memory device 500.
FIG. 7 is a memory map of the audio decoding apparatus 550 shown in FIG. 6. The memory map shown in FIG. 7 includes an area 600 for storing one-block PCM data, an area 601 for storing one-block decoded audio data for channel 0, an area 602 for storing one-block decoded audio data for channel 1, an area 603 for storing one-block decoded audio data for channel 2, an area 604 for storing one-block decoded audio data for channel 3, an area 605 for storing one-block decoded audio data for channel 4, and an area 606 for storing one-block decoded audio data for channel 5.
FIG. 8 is a flowchart illustrating a method for decoding one-block encoded audio data for each channel.
In step S11, a register (not shown), the internal memory device 507 (FIG. 6), and an external memory device 500 are initialized. In step S12, the bit stream stored in the external memory device 500 is input to the integrated semiconductor device 508 (receipt of encoded data).
Then, in step S13, the syntax of the bit stream is analyzed, and data required for decoding is extracted (bit stream analysis). In step S14, exponential data for a frequency domain is formed using the extracted data. In step S15, a mantissa data bit allocation amount is calculated using the exponential data for the frequency domain. In step S16, mantissa data for the frequency domain is formed using the mantissa data bit allocation amount. In step S17, decoded audio data is formed using the mantissa data for the frequency domain and the exponential data for the frequency domain. In step S18, the resultant decoded audio data is stored in the external memory device 500.
The above-described steps are executed for the number of channel included in one block until it is confirmed in step S19 that the steps are repeated for the required times. As a result, the number of pieces of decoded audio data corresponding to the number of channels included in one block are formed and stored in the external memory device 500.
In step S20, one-block decoded audio data for each channel in the external memory device 500 is input to the integrated semiconductor device 508. In step S21, the one-block decoded audio data for each channel is converted into one-block PCM data (down-mix calculation). In step S22, the one-block PCM data is output to the external memory device 500.
In the conventional audio decoder 600, one-block PCM data is calculated in one down-mix calculation. Accordingly, the amount of data transferred for inputting the decoded audio data to the external memory device 500 before the down-mix calculation and for writing the PCM data to the external memory device 500 after the down-mix calculation is sufficiently large to occupy a significant part of the memory bus. Such an occupation has an adverse effect on other processing performed by the external memory device 500.
A conventional signal processing device will be described. A part of the encoded data of a plurality of channels can be commonly shared by the channels. For example, high frequency band encoded data which is included in at least one of the plurality of channels and shared by the plurality of channels is decoded to form high frequency band decoded data. Low frequency band encoded data for each channel is decoded to form low-frequency band decoded data. The low-frequency band decoded data is coupled with the high-frequency band decoded data to form decoded data for each channel.
Such decoding will be described with reference to FIGS. 19, 20 and 21.
FIG. 20 is a block diagram of a conventional signal processor 1350 for performing the above-described signal decoding. As shown in FIG. 20, the bit stream is temporarily stored in an internal memory device 1301, and analyzed by a bit stream syntax analyzer 1300. Thus, required data is extracted. Exponential data for a frequency domain is formed by an exponential section decoder 1302 based on the extracted data. A mantissa data bit allocation amount is determined by a mantissa data bit allocator 1303 based on the exponential data for the frequency domain. Mantissa data is formed by a mantissa section decoder 1304 based on the mantissa data bit allocation amount. Frequency domain data is formed by a frequency domain data forming device 1305 based on the data formed by the exponential section decoder 1302 and the mantissa section decoder 1304.
The frequency domain data forming device 1305 decodes encoded data for an arbitrary channel in the following rule. High frequency encoded data which is included in at least one of a plurality of channels and shared by the plurality of channels is decoded to obtain high frequency band decoded data, and the high frequency band decoded data is multiplied by the ratio of the signal power of a prescribed channel obtained by an encoder with respect to the signal power of an arbitrary channel. The result is coupled with the low frequency decoded data for an arbitrary channel. Thus, decoded data for the arbitrary channel is obtained.
The obtained frequency domain decoded data is converted into time domain decoded data by a frequency domain-time domain converter 1306, and the result is converted into PCM data, which is output.
FIG. 21 schematically shows decoding of encoded data for an arbitrary channel.
In step 141, data in a prescribed channel 1400 is decoded to form a low frequency domain decoded data area 1402 and a high frequency band decoded data area 1403 which is shared by a plurality of channels. In step 142, the high frequency band decoded data area 1403 is multiplied by a ratio xcex1 of a signal power for the prescribed channel 1400 obtained by the encoder with respect to the high frequency band decoded data 1404 for an arbitrary channel 1401, thereby forming high frequency decoded data 1404 for the arbitrary channel 1401. In step 143, low frequency band decoded data 1405 for the arbitrary channel 1401 is coupled to the high frequency band decoded data 1404 to form decoded data for the channel 1401.
By using high frequency band encoded data which is shared by a plurality of channels, it is not necessary to transfer the high frequency band encoded data for each of the channels. Thus, transfer efficiency is improved.
For performing such decoding, a bit stream stored in the internal memory device 1301 (FIG. 20) is indicated by a plurality of pointers while extracting required data from the bit stream. Such a performance will be described with reference to FIG. 19.
The prescribed channel 1400 is decoded. Then, a mantissa section 1201 and an exponential section 1202 of low frequency band encoded data for the arbitrary channel 1401 included in a bit stream 1200 are indicated by respective pointers 1203 and 1204 and thus read to decode the low frequency encoded data. A mantissa section 1201 and an exponential section 1202 of high frequency band encoded data for the prescribed channel 1400 are indicated by respective pointers 1203 and 1204 and thus read to decode the high frequency encoded data.
Accordingly, the movement of the pointers 1203 and 1204 needs to be controlled to rewind as indicated by arrows 1205 and 1206. Furthermore, the bit stream needs to be stored in the memory device until data in all the channels sharing the high frequency band encoded data are decoded. Decoding of data in all the channels sharing the high frequency band encoded data requires a sufficiently large memory capacity to store the bit stream.
Moreover, decoding of the high frequency band encoded data, which imposes a larger load than decoding of usual low frequency band encoded data, is demanded to reduce the load.
In the fields of movies and broadcasting, multichannel (e.g., 5.1 channels) recording and reproduction are performed using a digital audio compression technology. However, reproduction of a multi-channel audio signal at home is limited, since most of the general home-use TVs have two or less output channels. It has been demanded to realize multi-channel reproduction is realized even by AV equipment having a two or less audio reproduction function, using sound field control or sound image control technologies.
Recently, a frequency domain conversion technology such as, for example, MDCT has often been used as an audio compression technology. Herein, a conventional sound image control technology will be described as well as an audio compression technology which uses frequency domain-time domain conversion.
FIG. 23 is a block diagram showing a basic structure of a conventional sound image localization device (sound image reproducer) 2500. First, a method of localizing a sound image to the right and forward of a listener 2010 using speakers 2008-1 and 2008-2 will be described. The speakers 2008-1 and 2008-2 are located forward with respect to the listener 2010. As shown in FIG. 23, the sound image localization device 2500 includes a signal source 2004, a signal divider 2006, signal processors 2001-1 and 2001-2, D/A converters 2007-1 and 2007-2, and control speakers 2008-1 and 2008-2.
The signal source 2004 receives a PCM audio signal S(t). The signal divider 2006 distributes the audio signal S(t) to left (L) and right (R) channels. The signal processor 2001-1 is a digital filter having a transmission characteristic hL(n), and the signal processor 2001-2 is a digital filter having a transmission characteristic hR(n). A digital output from the signal processor 2001-1 is converted into an analog signal by the D/A converter 2007-1 and sent to the control speaker 2008-1 provided on the left of the sheet of FIG. 23. A digital output from the signal processor 2001-2 is converted into an analog signal by the D/A converter 2007-2 and sent to the control speaker 2008-2 provided on the right of the sheet of FIG. 23.
FIG. 24 is a block diagram of the signal processor 2001-1. The signal processor 2001-2 has the same structure. The signal processor 2001-1 is a FIR filter including n pieces of delay circuits 2011-1 through 2011-n, n+1 pieces of multipliers 2012-1 through 2012-(n+1), and an adder 2013. The multipliers 2012-1 through 2012-(n+1) are connected to inputs and outputs of the delay circuits 2011-1 through 2011-n, and the outputs from the multipliers 2012-1 through 2012-(n+1) are added together by the adder 2013 and output.
With reference to FIGS. 23 and 24, the conventional sound image localization device 2500 operates in the following manner. In FIG. 23, the transfer function between the speaker 2008-1 and the ear of the listener 2010 is referred to as xe2x80x9cimpulse responsexe2x80x9d, and the value of the impulse response between the speaker 2008-1 and the left ear of the listener 2010 is h1(t). Hereinafter, the operation in the time domain will be described using the impulse response. The impulse response h1(t) is, more accurately, a response at the position of the left eardrum of the listener 2010 caused when an audio signal is input to the speaker 2008-1. For simplicity, measurement is always performed at the inlet of the ceruminous gland. The same effect is obtained when considered with respect to the frequency domain.
The value of the impulse response between the speaker 2008-1 and the right ear of the listener 2010 is h2(t). The value of the impulse response between the speaker 2008-2 and the left ear of the listener 2010 is h3(t). The value of the impulse response between the speaker 2008-2 and the right ear of the listener 2010 is h4(t). A speaker 2009 is assumed as a virtual sound source positioned to the right and forward of the listener 2010. The value of the impulse response between the virtual speaker 2009 and the left ear of the listener 2010 is h5(t). The value of the impulse response between the virtual speaker 2009 and the right ear of the listener 2010 is h6(t).
In such a structure, when an audio signal S(t) from the signal source 2004 is output from the virtual speaker 2009, the sound reaching the left ear of the listener 2010 is expressed by expression (1), and the sound reaching the right ear of the listener 2010 is expressed by expression (2).
L(t)=S(t)*h5(t)xe2x80x83xe2x80x83(1) 
R(t)=S(t)*h6(t)xe2x80x83xe2x80x83(2) 
In expressions (1) and (2), the symbol xe2x80x9c*xe2x80x9d represents a convolution operation. In actuality, the transfer function of the speaker and the like are multiplied, but these elements are ignored here. Alternatively, the transfer function of the speaker and the like can be considered to be included in h5(t) and h6(t).
The impulse responses and signals S(t) are considered to be discrete digital signals and respectively expressed as:
L(t)xe2x86x92L(n) 
R(t)xe2x86x92R(n) 
xe2x80x83h5(t)xe2x86x92h5(n)
h6(t)xe2x86x92h6(n) 
S(t)xe2x86x92S(n) 
In the above representations, the letter xe2x80x9cnxe2x80x9d indicates an integer. Where T is a sampling time, xe2x80x9cnxe2x80x9d in parentheses are more accurately written as nT. Here, xe2x80x9cTxe2x80x9d is omitted.
Expressions (1) and (2) are respectively expressed as expression (3) and (4), and the symbol xe2x80x9c*xe2x80x9d representing the convolution operation is replaced by xe2x80x9cxxe2x80x9d, which represents multiplication.
L(n)=S(n)xc3x97h5(n)xe2x80x83xe2x80x83(3) 
R(n)=S(n)xc3x97h6(n)xe2x80x83xe2x80x83(4) 
The signal S(t) which is output from the speakers 2008-1 and 2008-2 and reaches the left ear of the listener 2010 is expressed by expression (5).
Lxe2x80x2(t)=S(t)*hL(t)*h1(t) +S(t)*hR(t)*h3(t)xe2x80x83xe2x80x83(5) 
The signal S(t) which is output from the speakers 2008-1 and 2008-2 and reaches the right ear of the listener 2010 is expressed by expression (6).
Rxe2x80x2(t)=S(t)*hL(t)*h2(t) +S(t)*hR(t)*h4(t)xe2x80x83xe2x80x83(6) 
Expressions (5) and (6) are expressed as expressions (8) and (9) using the impulse response.
xe2x80x83Lxe2x80x2(n)=S(n)xc3x97hL(n)xc3x97h1(n) +S(n)xc3x97hR(n)xc3x97h3(n)xe2x80x83xe2x80x83(8)
Rxe2x80x2(n)=S(n)xc3x97hL(n)xc3x97h2(n) +S(n)xc3x97hR(n)xc3x97h4(n)xe2x80x83xe2x80x83(9) 
Here, hL(n) represents the transmission characteristic of the signal processor 2001-1, and hR(n) represents the transmission characteristic of the signal processor 2001-2.
The following description is performed with the premise that when the transfer function between the ear and the speaker is the same, the sound is output in the same direction. This premise is generally correct. When expression (10) is assumed, expression (11) is generated.
L(n)=Lxe2x80x2(n)xe2x80x83xe2x80x83(10)
h5(n)=hL(n)xc3x97h1(n)+hR(n)xc3x97h3(n)xe2x80x83xe2x80x83(11)
Similarly, when expression (12) is assumed, expression (13) is generated.
R(n)=Rxe2x80x2(n)xe2x80x83xe2x80x83(12) 
h6(n)=hL(n)xc3x97h2(n)+hR(n)xc3x97h4(n)xe2x80x83xe2x80x83(13) 
In order that the listener 2010 can hear prescribed sound from the right and forward of the listener 2010 where the speaker which the virtual speaker 2009 is assumed to exist, the values of hL(n) and hR(n) are determined so as to fulfill the expressions (11) and (13). For example, when the expressions (11) and (13) are written by the representation of frequency domain, the convolution operation is replaced by multiplication, and the other elements are replaced by transfer functions obtained by performing FFT of the values of impulse responses. Since the transfer function other than those of the FIR filter, the transfer function of the FIR filter is obtained by these two expressions.
In the case where a signal S(n) and convoluted hL(n) are output from the speaker 2008-1 and a signal S(n) and convoluted hR(n) are output from the speaker 2008-2 using hL(n) and hR(n) determined in this manner, the listener 2010 feels the sound being output from the right and forward where the virtual speaker 2009 is assumed to exist. FIG. 24 shows a structure of an FIR filter. The FIR filter shown in FIG. 24 localizes a sound image at an arbitrary position by the above-described signal processing.
However, the above-described structure requires an FIR filter to be provided for each of the channels and a convolution operation to be performed many times, in order to provide an actual head-related transfer function. When the number of filters and/or the number of channels increase, the load imposed on the operation rate and the hardware becomes excessively large for practical use. The number of taps of the FIR filters can be reduced for practical use, but a certain number of taps are necessary to maintain the precision of the head-related transfer function. When the number of taps is excessively small, the sound image is blurred or the sound quality deteriorates.
A system for reproducing a medium including video data and audio data in a compressed format, such as a DVD (digital video disk). In such a system, the video and audio input data are divided into a plurality of packets and then multiplexed. Video and audio are reproduced by separating the video data (also referred to as the xe2x80x9cvideo signalxe2x80x9d) and the audio data (also referred to as the xe2x80x9caudio signalxe2x80x9d) from such input data and decoding such separated data. A conventional system will be described using a DVD as an example.
Video data is compressed by MPEG2 and includes three types of picture data, i.e., I picture, P picture and B picture. In the NTSC standard, each picture is recorded at the unit of {fraction (1/60)} sec. in the case of a field structure and at the unit of {fraction (1/30)} sec. in the case of a frame structure.
Exemplary audio standards used in the DVD include AC-3 and MPEG-2BC. In such standards, one frame includes 1536 audio samples, with the sampling frequency of 48 kHz. The data is recorded in a DVD in the state of being compressed at the unit of 32 ms.
In order to reproduce audio and video data which are recorded by different time units, synchronization of the data is required. In the case of a DVD, video and audio data are synchronized for output under the control of a program time stamp (PTS) attached to each packet. In other words, the time for reproducing the video data and the time for reproducing the video data are independently adjusted.
High-rate reproduction performed in such a system will be described. In general, the following methods are used for reproducing video data at a high rate.
(1xe2x80x941) Reproduce only I picture (reproduction rate: about 6 to 7 times normal)
(1-2) Reproduce only I and P pictures (reproduction rate: about 1.5 to 3 times normal)
(1-3) Reproduce I and P pictures and a part of B picture (reproduction rate: about 1 to 1.5 times normal)
Since the number of each type of pictures varies in accordance with the method of encoding, bit rate and the like, the reproduction rate for high-rate reproduction is not constant and possibly becomes as diverse as about 1.5 to about 7 times by either method (1xe2x80x941), (1-2) or (1-3).
The following methods are used for reproducing audio data at a high rate.
(2-1) Thin out output data and smooth non-continuous points.
(2xe2x80x942) Delete silent parts.
According to the method (2-1), the reproduction rate is fixed. Therefore, when the reproduction rate of the video data is higher than the reproduction rate of the audio data, the sound continues, but the video cannot be reproduced at a higher rate than that of the audio data. When the reproduction rate of the video data is lower than the reproduction rate of the audio data, the sound does not continue.
The method (2xe2x80x942) is difficult to practically use due to the problems that it is difficult to raise the reproduction rate of the audio data up to the highest reproduction rate of the video data (maximum rate), and that the processing for detecting a silent part requires a heavy load.
Generally, high-rate reproduction of a recording medium is mostly used by the consumer in order to search for a scene. In most of the DVDs which are conventionally available, only the video data is reproduced for high-rate reproduction without outputting audio data.
According to an aspect of the invention, an audio decoding apparatus is provided for receiving a bit stream on a block-by-block basis, decoding one block of the bit stream to form decoded audio data for a plurality of channels, and storing the decoded audio data for each of the plurality of channels in a memory device, thereby down-mixing the decoded audio data for each of the plurality of channels. The audio decoding apparatus includes an operation section for down-mixing the decoded audio data for each of the plurality of channels corresponding to a first block of the bit stream in the memory section while a second block of the bit stream is decoded.
In one embodiment of the invention, the second block of the bit stream is converted into the decoded audio data for each channel by a plurality of separate decoding operations, and the operation section divides the decoded audio data for each channel corresponding to the first block of the bit stream in the memory section and down-mixes the divided decoded audio data sequentially each time the decoding operation is performed.
In one embodiment of the invention, the second block of the bit stream is converted into the decoded audio data for each channel by repeating a decoding operation by the number of the plurality of channels, and the operation section divides the decoded audio data for each channel corresponding to the first block of the bit stream in the memory section and down-mixes the divided decoded audio data sequentially each time the decoding operation is performed.
In one embodiment of the invention, the decoded audio data obtained as a result of down-mixing is stored in the memory section and then output.
According to another aspect of the invention, an audio decoding apparatus is provided for decoding a bit stream which is obtained as a result of converting each of audio signals in a plurality of channels into frequency domain data and encoding the frequency domain data so as to be represented by mantissa sections and exponential sections. The audio decoding apparatus includes a bit stream syntax analyzer for analyzing a syntax of the bit stream and extracting data necessary for decoding from the bit stream; an internal memory section for storing the data necessary for decoding; an exponential section decoder for forming exponential data for a frequency domain corresponding to the audio signal based on the data stored in the internal memory section; a mantissa data bit allocator for calculating a mantissa data bit allocation amount from the exponential data output from the exponential section decoder; a mantissa section decoder for forming mantissa data for the frequency domain corresponding to the audio signal based on the data bit allocation amount output from the mantissa data bit allocator; an IMDCT section for performing frequency domain-time domain conversion of the exponential data formed by the exponential section decoder and the mantissa data formed by the mantissa section decoder so as to form decoded audio data for each of the plurality of channels; and a down-mix operator for forming PCM data from the decoded audio data for each of the plurality of channels and processing the PCM data by interleaving. The bit stream, decoded audio data and the PCM data are stored in an external memory section. The bit stream is received by a block-by-block basis, and while a second block of the bit stream is decoded, the PCM data is formed from the decoded audio data for each of the plurality of channels corresponding to a first block of the bit stream stored in the external memory section.
In one embodiment of the invention, the external memory section includes a PCM data storage area and a decoded audio data storage area corresponding to each of the plurality of channels. The PCM data storage area has a sufficient capacity to store the PCM data corresponding to one block of the bit stream including an amount of data of a plurality of channelsxc3x97a plurality of pieces of data. The decoded audio data storage area includes a plurality of areas respectively corresponding to the plurality of channels, and each of the plurality of areas has a sufficient capacity to store the decoded audio data corresponding to more than one block of the bit stream.
In one embodiment of the invention, the audio decoding apparatus further includes a decoded audio data write pointer corresponding to each of the plurality of channels for writing the decoded audio data into the external memory section; a decoded audio data read pointer corresponding to each of the plurality of channels for reading the decoded audio data from the external memory section; a PCM write pointer for writing the PCM data into the external memory section; and final address data in the decoded audio data storage area and decoded audio data pointer return data, both corresponding to each of the plurality of channels, for updating the decoded audio write pointer and the decoded audio read pointer. The decoded audio data writer pointer and the decoded audio data read pointer are independently updated and circulated in an area allocated for the respectively channel.
In one embodiment of the invention, the down-mix operator processes the decoded audio data for each of the plurality of channels by N number of separate operations.
According to still another aspect of the invention, a signal processing device is provided for receiving a bit stream including encoded data for a plurality of channels, decoding encoded data which is included in at least one of the plurality of channels and is shared by the channels to form common decoded data, decoding channel encoded data inherent to each of the plurality of channels on a channel-by-channel basis to form channel decoded data, and coupling the channel decoded data and the common decoded data so as to form decoded data for each of the plurality of channels. The signal processing device includes a memory section for storing the common decoded data formed as a result of decoding the common encoded data; and a control section for reading the common decoded data from the memory section each time the channel encoded data is decoded to form the channel decoded data, and causing coupling of the common decoded data and the channel decoded data.
According to still another aspect of the invention, a signal processing device is provided for receiving a bit stream including encoded data for a plurality of channels, decoding encoded data which is included in at least one of the plurality of channels and is shared by the channels to form common decoded data, decoding channel encoded data inherent to each of the plurality of channels on a channel-by-channel basis to form channel decoded data, and coupling the channel decoded data and the common decoded data so as to form decoded data for each of the plurality of channels. The signal processing device includes a memory section for storing intermediate data obtained while decoding the common encoded data; and a control section for reading the intermediate data from the memory section each time the channel encoded data is decoded to form the channel decoded data, forming the common decoded data from the intermediate data, and causing coupling of the common decoded data and the channel decoded data.
According to still another aspect of the invention, a signal processing device is provided for decoding a bit stream which is obtained as a result of converting each of audio signals in a plurality of channels into frequency domain data and encoding the frequency domain data so as to be represented by mantissa sections and exponential sections, decoding high frequency band encoded data which is included in at least one of the plurality of channels and is shared by the channels to form high frequency band decoded data, decoding low frequency band encoded data for each of the plurality of channels to form low band decoded data, coupling the high frequency band decoded data and the high frequency band so as to form decoded data for each of the plurality of channels. The signal processing device includes a bit stream syntax analyzer for analyzing a syntax of the bit stream and extracting data necessary for decoding from the bit stream; an internal memory section for storing the data necessary for decoding; an exponential section decoder for forming exponential data for a frequency domain corresponding to the audio signal based on the data stored in the internal memory section; a mantissa data bit allocator for calculating a mantissa data bit allocation amount from the exponential data output from the exponential section decoder; a mantissa section decoder for forming mantissa data for the frequency domain corresponding to the audio signal based on the data bit allocation amount output from the mantissa data bit allocator; and a data forming section for synthesizing the high frequency band decoded data and the low frequency band decoded data for each of the plurality of channels based on the exponential data formed by the exponential section decoder and the mantissa data formed by the mantissa section decoder, coupling the low frequency band decoded data for each of the plurality of channels and the high frequency band decoded data, and performing frequency domain-time domain conversion of the resultant data so as to form decoded data for each of the plurality of channels. The high frequency band decoded data is stored in the internal memory section, and for forming the low frequency band decoded data for each of the plurality of channels, the high frequency band decoded data is read from the internal memory section and the low frequency band decoded data is coupled with the high frequency band decoded data.
In one embodiment of the invention, the high frequency band decoded data is compressed and stored in the internal memory section.
According to still another aspect of the invention, a signal processing device is provided for decoding a bit stream which is obtained as a result of converting each of audio signals in a plurality of channels into frequency domain data and encoding the frequency domain data so as to be represented by mantissa sections and exponential sections, decoding high frequency band encoded data which is included in at least one of the plurality of channels and is shared by the channels to form high frequency band decoded data, decoding low frequency band encoded data for each of the plurality of channels to form low band decoded data, coupling the high frequency band decoded data and the high frequency band so as to form decoded data for each of the plurality of channels. The signal processing includes a bit stream syntax analyzer for analyzing a syntax of the bit stream and extracting data necessary for decoding from the bit stream; an internal memory section for storing the data necessary for decoding; an exponential section decoder for forming exponential data for a frequency domain corresponding to the audio signal based on the data stored in the internal memory section; a mantissa data bit allocator for calculating a mantissa data bit allocation amount from the exponential data output from the exponential section decoder; a mantissa section decoder for forming mantissa data for the frequency domain corresponding to the audio signal based on the data bit allocation amount output from the mantissa data bit allocator; and a data forming section for synthesizing the high frequency band decoded data and the low frequency band decoded data for each of the plurality of channels based on the exponential data formed by the exponential section decoder and the mantissa data formed by the mantissa section decoder, coupling the low frequency band decoded data for each of the plurality of channels and the high frequency band decoded data, and performing frequency domain-time domain conversion of the resultant data so as to form decoded data for each of the plurality of channels. Intermediate data obtained while decoding the high frequency band encoded data is stored in the internal memory section, and for forming the low frequency band decoded data for each of the plurality of channels, the intermediate data is read from the internal memory section, the high frequency band decoded data is formed from the intermediate data, and the low frequency band decoded data is coupled with the high frequency band decoded data.
In one embodiment of the invention, the high frequency band decoded data is compressed and stored in the internal memory section.
In one embodiment of the invention, the intermediate data is exponential data output from the exponential section decoder.
In one embodiment of the invention, the intermediate data is a mantissa data bit allocation amount output from the mantissa data bit allocator.
In one embodiment of the invention, the intermediate data is mantissa data output from the mantissa section decoder.
According to still another aspect of the invention, a sound image localization device includes a signal source for outputting an audio signal; a signal divider for dividing the audio signal output from the signal source into two digital audio signals respectively for two channels; a first signal processor for receiving one of the two digital signals and processing the digital signal so as to localize a virtual sound image using a filter having a first frequency characteristic; a first D/A converter for converting the digital signal output from the first signal processor into an analog signal; a second D/A converter for receiving the other digital signal obtained from the signal divider and converting the signal into an analog signal; a first control speaker for outputting the audio signal obtained by the first D/A converter to a prescribed space area; and a second control speaker for outputting the audio signal obtained by the second D/A converter to a prescribed space area.
In one embodiment of the invention, the first frequency characteristic of the first signal processor is determined so that sounds reaching from the first and second control speakers to left and right ears of a listener have a difference which is identical with a difference between sounds reaching from the virtual sound image to the left and right ears of the listener.
According to still another aspect of the invention, a sound image localization device includes a signal source for outputting an audio signal; a second signal processor for processing the audio signal output from the signal source using a filter having a second frequency characteristic; a signal divider for dividing the audio signal output from the second signal processor into two digital audio signals respectively for two channels; a first signal processor for receiving one of the two digital signals and processing the digital signal so as to localize a virtual sound image using a filter having a first frequency characteristic; a first D/A converter for converting the digital signal output from the first signal processor into an analog signal; a second D/A converter for receiving the other digital signal obtained from the signal divider and converting the signal into an analog signal; a first control speaker for outputting the audio signal obtained by the first D/A converter to a prescribed space area; and a second control speaker for outputting the audio signal obtained by the second D/A converter to a prescribed space area.
In one embodiment of the invention, the first frequency characteristic of the first signal processor is determined so that sounds reaching from the first and second control speakers to left and right ears of a listener have a difference which is identical with a difference between sounds reaching from the virtual sound image to the left and right ears of the listener. The second frequency characteristic of the second signal processor corrects at least one of a sound quality, a sound volume change and a phase characteristic of the first frequency characteristic of the first signal processor.
According to still another aspect of the invention, a sound image localization device includes a signal source for outputting an audio signal for a frequency domain; a third signal processor for processing the audio signal for the frequency domain output from the signal source using a filter having a third frequency characteristic; a frequency domain-time domain converter for converting the audio signal for the frequency domain output from the third signal processor into an audio signal in a time domain; a signal divider for dividing the audio signal output from the frequency domain-time domain converter into two digital audio signals respectively for two channels; a first signal processor for receiving one of the two digital signals and processing the digital signal so as to localize a virtual sound image using a filter having a first frequency characteristic; a first D/A converter for converting the digital signal output from the first signal processor into an analog signal; a second D/A converter for receiving the other digital signal obtained from the signal divider and converting the signal into an analog signal; a first control speaker for outputting the audio signal obtained by the first D/A converter to a prescribed space area; and a second control speaker for outputting the audio signal obtained by the second D/A converter to a prescribed space area.
In one embodiment of the invention, the first frequency characteristic of the first signal processor is determined so that sounds reaching from the first and second control speakers to left and right ears of a listener have a difference which is identical with a difference between sounds reaching from the virtual sound image to the left and right ears of the listener. The third frequency characteristic of the second signal processor corrects at least one of a sound quality, a sound volume change and a phase characteristic of the first frequency characteristic of the first signal processor on the frequency domain.
According to still another aspect of the invention, a sound image localization device includes a signal source for outputting an audio signal for a frequency domain; a third signal processor for processing the audio signal for the frequency domain output from the signal source using a filter having a third frequency characteristic; a frequency domain-time domain converter for converting the audio signal for the frequency domain output from the third signal processor into an audio signal in a time domain; a second signal processor for processing the audio signal output from the frequency domain-time domain converter using a filter having a second frequency characteristic; a signal divider for dividing the audio signal output from the second signal processor into two digital audio signals respectively for two channels; a first signal processor for receiving one of the two digital signals and processing the digital signal so as to localize a virtual sound image using a filter having a first frequency characteristic; a first D/A converter for converting the digital signal output from the first signal processor into an analog signal; a second D/A converter for receiving the other digital signal obtained from the signal divider and converting the signal into an analog signal; a first control speaker for outputting the audio signal obtained by the first D/A converter to a prescribed space area; and a second control speaker for outputting the audio signal obtained by the second D/A converter to a prescribed space area.
In one embodiment of the invention, the first frequency characteristic of the first signal processor is determined so that sounds reaching from the first and second control speakers to left and right ears of a listener have a difference which is identical with a difference between sounds reaching from the virtual sound image to the left and right ears of the listener. A coupled frequency characteristic of the third frequency characteristic of the third signal processor and the second frequency characteristic of the second signal processor corrects at least one of a sound quality, a sound volume change and a phase characteristic of the first frequency characteristic of the first signal processor on the frequency domain.
According to still another aspect of the invention, a sound image control method is provided for localizing a sound image at a position of a virtual sound image corresponding to an audio signal from a signal source, using a first control speaker and a second control speaker respectively provided in a space left to the listener and a space to the right of the listener. The method includes the steps of providing a signal processor for processing a signal to be input to the first control speaker; and obtaining a frequency characteristic G(n) for providing a state in which sounds reaching from the first and second control speakers to the left and right ears of the listener have a difference which is identical with a difference between sounds reaching from the virtual sound image to the left and right ears of the listener, and causing the signal processor to have the frequency characteristic G(n) so as to localize the audio signal at the position of the virtual sound image.
In one embodiment of the invention, the frequency characteristic G(n) is obtained by the following steps:
where the impulse response between the first control speaker and the left ear of the listener is h1(t), the impulse response between the first control speaker and the right ear of the listener is h2(t), the impulse response between the second control speaker and the left ear of the listener is h3(t), the impulse response between the second control speaker and the right ear of the listener is h4(t), a virtual sound image localized in an arbitrary direction is an virtual speaker, the impulse response between the virtual speaker and the left ear of the listener is h5(t), and the impulse response between the virtual speaker and the right ear of the listener is h6(t),
(1) obtaining a sound reaching the left ear of the listener by L(t)=S(t)*h5(t) and obtaining a sound reaching the right ear of the listener by R(t)=S(t)*h6(t), where an audio signal S(t) is output from the virtual speaker from a signal source;
(2) converting signals L(t), R(t), h5(t), h6(t), and S(t) on a time axis into discrete signals L(n), R(n), h5(n), h6(n), and S(n);
(3) obtaining L(n)=S(n)xc3x97h5(n) and R(n)=S(n)xc3x97h6(n);
(4) calculating the sound output from the first control speaker and reaching the left ear of the listener by
Lxe2x80x2(t)=S(t)*hL(t)*h1(t)+S(t)*hR(t)*h3(t); 
(5) calculating the sound output from the first control speaker and reaching the right ear of the listener by
Rxe2x80x2(t)=S(t)*hL(t)*h2(t)+S(t)*hR(t)*h4(t); 
(6) converting Lxe2x80x2(t) into
Lxe2x80x2(n)=S(n)xc3x97hL(n)xc3x97h1(n)+S(n)xc3x97hR(n)xc3x97h3(n); 
(7) converting Rxe2x80x2(t) into
Rxe2x80x2(n)=S(n)xc3x97hL(n)xc3x97h2(n)+S(n)xc3x97hR(n)xc3x97h4(n); 
(8) L(n)=Lxe2x80x2(n) is assumed as
h5(n)=hL(n)xc3x97h1(n)+hR(n)xc3x97h3(n); 
(9) R(n)=Rxe2x80x2(n) is assumed as
h6(n)=hL(n)xc3x97h2(n)+hR(n)xc3x97h4(n); 
and
(10) hL(n) and hR(n) are calculated from steps (8) and (9) and obtaining G(n) based on G(n)=hL(n)/hR(n).
According to still another aspect of the invention an audio signal processor includes a control section for indicating a reproduction rate; an input signal processor for processing an input signal obtained as a result of multiplexing an audio signal and a video signal and outputting an audio signal and a video signal; an audio stream buffer for temporarily storing the audio signal output by the input signal processor; a video stream buffer for temporarily storing the video signal output by the input signal processor; an audio processor for extracting the audio signal from the audio stream buffer and processing the audio signal so as to form an output audio signal; a video processor for extracting the video signal from the video stream buffer and processing the video signal, and performing high-rate reproduction of the video signal in response to an instruction from the control section to form an output video signal; and a buffer controller for supervising a state of the audio stream buffer and controlling data input and output so that the audio processor performs the high-rate reproduction of the audio signal when a free capacity of the audio stream buffer becomes smaller than a prescribed level.
According to still another aspect of the invention, an audio signal processor includes a control section for indicating a reproduction rate; an input signal processor for processing an input signal obtained as a result of multiplexing an audio signal and a video signal and outputting an audio signal and a video signal; an audio stream buffer for temporarily storing the audio signal output by the input signal processor; a video stream buffer for temporarily storing the video signal output by the input signal processor; an audio processor for extracting the audio signal from the audio stream buffer and processing the audio signal so as to form an output audio signal; a video processor for extracting the video signal from the video stream buffer and processing the video signal, and performing high-rate reproduction of the video signal in response to an instruction from the control section to form an output video signal; and a buffer controller for supervising a state of the video stream buffer and controlling data input and output so that the audio processor performs the high-rate reproduction of the audio signal when a remaining data amount in the video stream buffer becomes smaller than a prescribed level.
According to still another aspect of the invention, an audio signal processor includes a control section for indicating a reproduction rate; an input signal processor for processing an input signal obtained as a result of multiplexing an audio signal and a video signal and outputting an audio signal and a video signal; an audio stream buffer for temporarily storing the audio signal output by the input signal processor; a video stream buffer for temporarily storing the video signal output by the input signal processor; an audio processor for extracting the audio signal from the audio stream buffer and processing the audio signal so as to form an output audio signal; a video processor for extracting the video signal from the video stream buffer and processing the video signal, and performing high-rate reproduction of the video signal in response to an instruction from the control section to form an output video signal; and a buffer controller for supervising a state of the audio stream buffer and the video stream buffer and controlling data input and output so that the audio processor performs the high-rate reproduction of the audio signal when a free capacity of the audio stream buffer or a remaining data amount in the video stream buffer becomes smaller than a prescribed level.
In one embodiment of the invention, the method for performing high-rate reproduction of an audio signal in the audio signal processor includes the step of thinning out the audio signal by intermittently flushing a content in the audio stream buffer to reduce an amount of audio data to be reproduced while performing high-rate reproduction of the video signal.
In one embodiment of the invention, the method for performing high-rate reproduction of an audio signal in the audio signal processor includes the step of stopping, for a prescribed time period, transfer of the audio signal from the input signal processor to the audio stream buffer to reduce an amount of audio data to be reproduced while performing high-rate reproduction of the video signal.
In one embodiment of the invention, the method for performing high-rate reproduction of an audio signal in the audio signal processor includes the step of skipping a prescribed amount of data input from the audio stream buffer to the audio processor to reduce an amount of audio data to be reproduced while performing high-rate reproduction of the video signal.
In one embodiment of the invention, the method for performing high-rate reproduction of an audio signal in the audio signal processor includes the step of stopping, for a prescribed time period, an output of the audio signal from the audio processor to reduce an amount of audio data to be reproduced while performing high-rate reproduction of the video signal.
Thus, the invention described herein makes possible the advantages of (1) providing an audio decoding apparatus for realizing efficient use of a memory bus; (2) providing a signal processing device for alleviating decoding processing of encoded data which is shared by all channels without requiring a memory device to store encoded data for all channels until the decoding processing is completed; (3) providing a sound image localization device for providing, by a smaller amount of operation, a similar level of feeling of localization to that obtained when a larger number of taps of digital filters are used, and a method for controlling the sound image using such a sound image localization device; and (4) an audio signal processing device for simplifying signal processing and reproducing audio data in accordance with the reproduction rate of the video data with less sound disconnection, and a method for performing high-rate reproduction of audio data using such an audio signal processing device.