1. Technical Field
This invention relates to a method and system for coding (e.g., encoding or decoding) speech information using an adaptive codebook with different resolution levels within a variable resolution scheme.
2. Related Art
Speech encoding may be used to increase the traffic handling capacity of an air interface of a wireless system. A wireless service provider generally seeks to maximize the number of active subscribers served by the wireless communications service for an allocated bandwidth of electromagnetic spectrum to maximize subscriber revenue. A wireless service provider may pay tariffs, licensing fees, and auction fees to governmental regulators to acquire or maintain the right to use an allocated bandwidth of frequencies for the provision of wireless communications services. Thus, the wireless service provider may select speech encoding technology to get the most return on its investment in wireless infrastructure.
Certain speech encoding schemes store a detailed database at an encoding site and a duplicate detailed database at a decoding site. Encoding infrastructure transmits reference data for indexing the duplicate detailed database to conserve the available bandwidth of the air interface. Instead of modulating a carrier signal with the entire speech signal at the encoding site, the encoding infrastructure merely transmits the shorter reference data that represents the original speech signal. The decoding infrastructure reconstructs a replica of the original speech signal by using the shorter reference data to access the duplicate detailed database at the decoding site.
The quality of the speech signal may be impacted if an insufficient variety of excitation vectors are present in the detailed database to accurately represent the speech underlying the original speech signal. The number of code identifiers supported by the maximum number of bits of the shorter reference data is one limitation on the variety of excitation vectors in the detailed database (e.g., codebook). Code identifiers may represent different values of pitch lags, or vice versa. Pitch lag refers to a temporal measurement of the repetition component (e.g., generally periodic waveform) that is observable in voiced speech or a voiced component of speech. Pitch lag values may be used as an index to search for or find excitation vectors in the detailed database. A granularity of the excitation vectors refers to a step size between adjacent cells of excitation vectors in the detailed database. Reducing the granularity of the excitation vectors may improve the quality of reproduction of the speech signal by reducing quantization error in the speech coding process. However, the granularity of the excitation vectors is generally limited to what can be represented by a fixed number of bits for transmission over the air interface to conserve spectral bandwidth.
The limited number of possible excitation vectors, represented by a fixed maximum number of bits, may not afford the accurate or intelligible representation of the speech signal by the excitation vectors. Accordingly, at times the reproduced speech may be artificial-sounding, distorted, unintelligible, or not perceptually palatable to subscribers. Thus, a need exists for enhancing the quality of reproduced speech, while adhering to the bandwidth constraints imposed by the transmission of reference or indexing information within a limited number of bits.
In one prior art configuration, the excitation vectors in the adaptive codebook may have a uniform resolution regardless of the actual value of the pitch lag. However, the proper selection of excitation vectors for lower pitch lag values often has a greater impact on the speech quality of the reproduced speech than the proper selection of excitation vectors for higher pitch lag values. Thus, a uniform resolution versus pitch lag may result in lower perceptual quality of the reproduced speech than otherwise possible.
In another prior art configuration, the excitation vectors in the adaptive codebook may have several discrete resolution levels that may be expressed as a coarse step function with coarse granularity. Although a coarse step function may be tailored to capture some voice quality benefits of the lower pitch lag values, the coarse step function provides reference to only a limited number of discrete excitation vectors. Accordingly, the discrete resolution levels may provide an inadequately accurate representation of the encoded speech signal because of quantization error. The coarse step function cannot generally be converted to a fine step function with fine granularity and improved speech reproduction because the number of bits allocated to the adaptive codebook indices is limited based on the available bandwidth or transmission capacity of the air interface. Thus, a need exists for associating adaptive codebook indexes with corresponding excitation vectors in a nonuniform quantization manner according to the pitch lag to enhance speech quality.
A speech coding system features an enhanced variable resolution scheme with generally continuously variable or finely variable resolution levels for an intermediate range of pitch lags. The enhanced variable resolution scheme facilitates quality enhancement of reproduced speech, while conserving the available bandwidth of an air interface of a wireless system. The speech coding system reduces or minimizes the quantization error associated with the selection of excitation vectors because of the generally continuously variable nature or finely variable nature of the resolution levels within the intermediate range. Accordingly, the continuously variable or finely variable resolution levels contribute toward a faithful reproduction of an input speech signal. Further, the lower pitch lags within the intermediate range have a greater resolution than the higher pitch lags within the intermediate range to represent the perceptually significant portions of the input speech signal in an accurate manner.
The speech coding system may be applied to speech encoders, speech decoders, or both. For example, an encoder or decoder includes an adaptive codebook containing excitation vector data associated with corresponding adaptive codebook indices (e.g., pitch lags). Different excitation vectors in the adaptive codebook may have different resolution levels. The resolution levels include a first resolution range of generally continuously variable resolution levels or sufficiently finely variable resolution levels to provide a desired level of perceptual quality. A gain adjuster scales a selected excitation vector data or preferential excitation vector data from the adaptive codebook. A synthesis filter synthesizes a synthesized speech signal in response to an input of the scaled excitation vector data.
Other systems, methods, features and advantages of the invention will be or will become apparent to one with skill in the art upon examination of the following figures and detailed description. It is intended that all such additional systems, methods, features and advantages be included within this description, be within the scope of the invention, and be protected by the accompanying claims.