1. Field of the Invention
The present invention relates to a coded voice signal format converting apparatus and more particularly to the coded voice signal format converting apparatus to convert a format of a voice signal coded by compression or a like between two different voice coding/decoding systems.
The present application claims priority of Japanese Patent Application No. 2000-052037 filed on Feb. 28, 2000, which is hereby incorporated by reference.
2. Description of the Related Art
As communications technology progresses in recent years, voice signals are generally handled in a coded manner by using a compression method or a like, which requires a coded voice signal format converting apparatus to convert a signal format of voice signals coded by the compression method or the like. When format of the coded voice signal is converted using such a coded voice signal format converting apparatus, it is desired that conversion of signal format can be made by computations in reduced amounts. Moreover, signal format converting technology of this kind is applied not only to voice signals but also to image signals.
One example of a conventional coded signal format converting apparatus adapted to convert, by computations in reduced amounts, a format of an image signal coded by compression method or a like is disclosed in, for example, Japanese Patent Application Laid-open No. Hei10-336672. The conventional coded signal format converting apparatus, as shown in FIG. 6, is made up of a decoding section 51, a motion vector memory 52, a resolution converting section 53 and a coding section 54 having a motion compensating section 55 and a coding processing section 56.
In the configurations described above, a coded moving picture (image signal) made up of an MPEG-2 (Motion Picture Experts Group-2) video input through an input terminal 61 is decoded into its original moving picture by the decoding section 51 and, at the same time, a motion vector existing at a time of coding and being contained in each coded data is stored in the motion vector memory 52. Decoded moving picture is input to the resolution converting section 53 and, after being sized so as to be handled by a method in which the input moving picture is re-coded by the resolution converting section 53, is further input to the coding section 54. In the coding section 54, the moving picture is re-coded based on the motion vector detected by the motion compensating section 55 from the motion vector memory 52 and is then output to outside communication devices or a like through an output terminal 62.
However, the conventional coded signal format converting apparatus disclosed in the above Japanese Patent Application Laid-open No. Hei 10-336672 has a problem in that, since this apparatus is intended for conversion of format of image signals made up of moving pictures, it cannot be applied to voice signals having no information about motion vectors. Therefore, it is not expected that a coded voice signal format converting apparatus capable of converting a format of a voice signal by computations in reduced amounts can be implemented.
In the conventional coded voice signal format converting apparatus, generally, a decoding device is connected, in serial, to a coding device. For example, when a format of a coded voice signal compressed by a coding device operating in accordance with a first coding/decoding system (voice coding/decoding system) is converted into a format which can be decoded by a decoding device operating in accordance with a second coding/decoding system (voice coding/decoding system), first, a coded voice signal whose format has not been converted is decoded by the decoding device operating in accordance with the first coding/decoding system and a voice signal is obtained. Then, the obtained voice signal is coded by using the coding device operating in accordance with the second coding/decoding system and a coded voice signal that can be decoded by the decoding device operating in accordance with the second coding/decoding system is obtained. As the decoding device and the coding device making up the conventional coded voice signal format converting device, existing available decoding and coding devices may be used in general.
The above first coding/decoding system is adapted to operate in accordance with, for example, any one of MPEG Audio, MPEG-2AAC and Dolby AC-3 systems. The above second coding/decoding system is also adapted to operate in accordance with any one of MPEG Audio, MPEG-2AAC and Dolby AC-3 systems, however, though both the first and second coding/decoding methods are operated in accordance with any one of these three systems, configurations of the first coding/decoding system are different from those of the second coding/decoding system.
The MPEG Audio system is described in detail in, for example, “ISO/IEC/11172-3, Coding of Moving Pictures and Associated Audio for Digital Storage Media at up to about 1.5 Mb/s” (hereinafter referred to as “Reference 1”). The MPEG-2AAC system is described in detail in, for example, “ISO/IEC/13818-7, Generic Coding of Moving Pictures and Associated Audio Information, 1993” (hereinafter referred to as “Reference 2”). The Dolby AC-3 system is described in detail in, for example, “Advanced Television Systems Committee A/52, Digital Audio Compression Standard (AC-3), 1995 (hereinafter referred to as “Reference 3”).
Next, configurations of a conventional coded voice signal format converting device will be described by referring to FIG. 5. As shown in FIG. 5, in the conventional coded voice signal format converting device, a first decoding device 310 adapted to operate in accordance with a first coding/decoding system is connected, in serial, to a second coding device 320 adapted to operate in accordance with a second coding/decoding system. A voice signal which has been coded in advance with the first coding/decoding system, after being decoded by the first decoding device 310, is coded by the second coding device 320 that can be decoded by a decoding device adapted to operate in accordance with the second coding/decoding method.
The first decoding device 310 includes a mapped signal generating section 311, a inverse mapping converting section 312 and a quantizing accuracy information decoding section 313. Even if any one of the MPEG Audio, MPEG-2AAC and Dolby AC-3 systems is employed by the first decoding device 310, configurations of the first decoding device 310 are common to any one of the three systems. However, configurations of the mapped signal generating section 311, inverse mapping converting section 312 and quantizing accuracy information decoding section 313 vary depending on each of the three systems and details of these three systems are provided in the above Reference 1 to Reference 3.
The second coding device 320 includes a mapping converting section 321, a mapped signal coding section 322 and a quantizing accuracy calculating section 323. Similarly, even if any one of the MPEG Audio, MPEG-2AAC and Dolby AC-3 is employed, configurations of the first decoding device 310 are common to any one of the three systems. However, configurations of the mapping converting section 321, mapped signal coding section 322 and quantizing accuracy calculating section 323 vary depending on each of the three systems and details of each of the three systems are provided in the Reference 1 to Reference 3 as described above.
Next, operations of the coded voice signal format converting apparatus will be described by referring to FIG. 5. A coded voice signal input through an input terminal 300 which has been in advance coded in accordance with the first coding/decoding system and whose format has to be converted is input to both the mapped signal generating section 311 and the quantizing accuracy information decoding section 313 in the first decoding device 310. The quantizing accuracy information decoding section 313 obtains, by decoding a part of the input coded voice signal, information about quantizing accuracy indicating how finely each of frequency components of the voice signal has been quantizied. The mapped signal generating section 311 first obtains, by decoding a part of the coded voice signal, a quantized value of a mapped signal. Then, the mapped signal generating section 311, by quantizing, in reverse, the obtained quantized value of the mapped signal based on quantizing accuracy designated by the quantizing accuracy information output from the quantizing accuracy information decoding section 313, obtains a first mapped signal.
The inverse mapping converting section 312, by making inverse mapping conversions of the first mapped signal output from the mapped signal generating section 311, obtains a first voice signal. The inverse mapping conversion is equivalent to a sub-band synthetic filter processing described in the Reference 1 and to inverse modified discrete cosine transform processing described in the Reference 2 and Reference 3.
The first voice signal output from the inverse mapping converting section 312 in the first decoding device 310 is input to the mapping converting section 321 and quantizing accuracy calculating section 323 in the second coding device 320. The mapping converting section 321, by making mapping conversions of the input voice signal, obtains a second mapped signal. The mapping conversion is equivalent to a sub-band analysis filter processing described in the Reference 1 and to a modified discrete cosine transform processing described in the Reference 2 and Reference 3. The mapped signal indicates a frequency component of the input voice signal.
The quantizing accuracy calculating section 323 analyzes the input voice signal and determines how finely the mapped signal indicating each of the frequency component of the voice signal is quantized. That is, more finer quantizing is performed on the frequency component that can be easily perceived by a human ear and less fine quantizing is performed on the frequency component that cannot be easily perceived by the human ear. Whether the frequency component can be easily perceived by the human ear or not is determined by an analysis on the input voice signal using a method in which a perception model of the human ear is imitated. The analysis method is described in detail in the Reference 1 Reference and 2 and its explanation is omitted accordingly. The method in which the perception model of the human ear is imitated is called a “psychological auditory sense analysis”, however, processing of the method is very complicated and, in general, the method requires very large amounts of computational processes.
The mapped signal coding section 322 quantizes the mapped signal output from the mapping converting section 321 based on quantizing accuracy calculated by the quantizing accuracy calculating section 323 to obtain a quantized value. Then, the quantizing accuracy calculating section 323 converts the obtained quantized value into coded strings to obtain a coded voice signal. The coded voice signal whose format has been thus converted is output from an output terminal 301.
However, the above conventional coded voice signal format converting apparatus has a problem in that it includes configuration elements requiring large amounts of computational processes, thus making it difficult to perform the voice signal format conversion by computations in reduced amounts. That is, in the conventional coded voice signal format converting apparatus, as shown in FIG. 5, the first decoding device 310 adapted to operate in the first coding/decoding system is connected, in series, to the second coding device 320 adapted to operate in accordance with the second coding/decoding system, however, since the second coding device 320 includes the quantizing accuracy calculating section 323 which requires large amounts of computational processes.
The quantizing accuracy calculating section 323 determines, based on the psychological auditory sense analysis described above, the quantizing accuracy defining how finely the mapped signal indicating each of frequency components of the input voice signal is quantized. However, its processing is very complicated and requires large amounts of computational processes, thus causing amounts of computational processes required for the conversion of voice signal formats to be made large.