The present invention relates generally to the field of speech coding and, in particular, to the quantization of successive pitch periods.
Based on the human speech processing mechanism, the pitch period contour of voiced speech evolves slowly in time. This phenomenon is exploited in many current speech coders by coding the difference between successive pitch periods thereby increasing the coding efficiency. In a typical coder operating on a subframe basis, such as the code excited linear predictive (CELP) coder, the absolute pitch period is sent a least once per frame.
The difference between successive pitch periods is generally referred to as a delta period. In prior art, the delta periods may attain uniformly distributed values from a limited range facilitating their coding. This can be interpreted as a multi-dimensional rectangular lattice populated uniformly by points that define the delta periods over the frame. Accordingly, coding of the delta periods is carried out by using a uniform quantizer. That is, similar quantizers are used to code independently several successive delta periods. An encoder that uses such an approach is also known as a multi-dimensional rectangular lattice quantizer. In a multi-dimensional lattice quantizer, each dimension represents a pitch period in a corresponding subframe. Usually, the first dimension of a lattice is indicative of the absolute pitch period in the first subframe, while each of the remaining dimensions represents the difference between the pitch periods of the current and the preceding subframe. Thus, in a speech coding scheme where a speech frame is divided into four subframes for speech processing, the encoder for use in the quantization of successive pitch periods is referred to as a four-dimensional lattice quantizer, and the absolute pitch period in the first dimension and the delta periods in the remaining three dimensions are represented by a point (p, d1, d2, d3) in a four-dimensional pitch space. In the present invention, special attention is paid to a lattice structure containing the dimensions only for the delta periods (d1, d2, d3, . . . , dn).
In most prior art speech coders utilizing differential coding, the lattice structure for n delta periods is described as a set of points with a regular arrangement in an n-dimensional pitch space such that the points are uniformly spaced throughout the pitch space. In addition to the uniform spacing of the points in the pitch space, the key feature of the prior art speech coders is the rectangular shape of the projection of the lattice points onto a two-dimensional plane. The structure of the lattice is usually constant regardless of the pitch period in the previous segment. An example of a typical two-dimensional lattice for delta periods is presented in FIG. 1, where the lattice L is defined by
L={(d1,d2)|d1minxe2x89xa6d1 xe2x89xa6d1maxxcex9d2minxe2x89xa6d2xe2x89xa6d2max}xe2x80x83xe2x80x83(1)
The lattice covers all possible combination of d1 and d2 between their respective minimum and maximum values. While the lattice, as shown in FIG. 1, is two-dimensional, higher dimensional lattices can be easily derived from the two-dimensional case. In general, the minimum and maximum possible delta periods for the jth dimension are denoted by djmin and djmax, respectively.
Once the shape and the region of the lattice quantizer are defined, an important parameter is the density of the lattice, for the density determines the bit rate of the coder. The bit rate is a monotonically increasing function of the density. Thus, the density of the lattice quantizer reflects the accuracy used for pitch period information. Normally, fractional values are used instead of integers to improve the quality of the synthesized speech.
In a typical lattice quantizer for delta periods, attention is usually paid to the boundary values (djmin, djmax) of the lattice while the rectangular shape of the lattice is kept constant. Attention is not paid, however, to the selection of a suitable set of lattice points to cover the regions of pitch space containing most of the source probability.
It is known that in a speech signal where pitch is a meaningful parameter, the evolution of pitch is smooth due to the characteristics of human speech processing mechanism. In general, the pitch period contour of voiced speech evolves slowly in time, and abrupt changes in the contour are very unlikely to happen. It has been found that a rectangular lattice structure is far from being optimal regarding the selection of lattice points to cover the regions of pitch space. Furthermore, in prior art, the search for differential pitch values is performed independently in each dimension. The use of rectangular lattices and the search method have not been optimized to reflect the known behavior of human speech.
It is advantageous and desirable to provide an improved method and system for the quantization of successive pitch periods in speech coders, taking advantage of the source probability in the pitch space to improve the quality of synthesized speech.
It is a primary object of the present invention to increase the efficiency of coding successive pitch periods thereby improving the quality of synthesized speeches in a speech coder utilizing differential coding to code the difference between successive pitch periods. This object can be achieved by defining an optimized, or more efficient, lattice structure which is shaped to cover the region of pitch space where the most probable points are located, based on a priori knowledge of the behavior of successive delta periods in voiced speech. Furthermore, regions with different point density representing different time resolution for pitch periods can be defined within the optimized lattice structure. With such an optimized lattice structure, a new method for assigning an index to a point in the optimized lattice structure and the search of the index in a codebook can be provided.
Thus, according to the first aspect of the present invention, a method of coding a sound signal in a plurality of signal frames each having a pitch period indicative of the sound signal in the respective signal frame, wherein each signal frame comprises a plurality of signal segments each representing a dimension in a pitch space, and the sound signal in each of the signal segments is characterized by a pitch value, and wherein the pitch values are representable by a point distribution pattern characteristic of the sound signal in a lattice structure for defining codebook indices in the pitch space, said method comprising the steps of:
shaping the lattice structure based on the point distribution pattern; and
providing a codebook index representing the pitch value in each dimension of the pitch space according to the shaped lattice structure for facilitating coding of the sound signal.
According the first aspect of the present invention, the method further comprises the steps of:
obtaining an open-loop estimate of the pitch period by an open-loop search from the shaped lattice structure considering all of the dimensions of the pitch space; and
refining the open-loop estimate in each of the dimensions in the pitch space separately by a closed-loop search from the shaped lattice structure for obtaining a closed-loop search value indicative of the pitch value in the respective signal segment.
According to the present invention, the pitch value is indicative of a differential pitch period or an absolute pitch period.
According to the present invention, the pitch value in at least one of the signal segments is indicative of an absolute pitch period and the pitch value in each of the remaining signal segments is indicative of a differential pitch period.
Accordingly, when the signal segments comprise sequentially a first signal segment and three second signal segments, the pitch value in the first signal segment is indicative of an absolute pitch period and the pitch value in each of the second signal segments is indicative of a differential pitch period.
Alternatively, each of the signal frames comprises four signal segments, and the pitch value in each of the four signal segments is indicative of a differential pitch period.
According to the present invention, the signal segments can be arranged in successive subframes. Thus, the pitch value in the first subframe can be an absolute pitch period or a differential pitch period, and the pitch value in each of the remaining subframes is a differential pitch period.
Preferably, each point in the lattice structure represents a distance from a reference point of the pitch space and the lattice structure is shaped to eliminate points that exceed a predetermined distance.
In particular, the shaped lattice structure of the present invention is composed of a union of non-overlapping hypercubes, which are defined by the delta period range and the time resolution in each dimension of the pitch space, and wherein each hypercube is representable by a plurality of edges comprising a number of lattice points. The index of the optimized lattice, according to the present invention, is indicative of the number of lattice points on the edges of the hypercubes.
It should be noted that a codebook index is provided and conveyed by an encoding means to a decoding means having information indicative of the shaped lattice, and wherein the decoding means synthesizes speech signal from the codebook index based on the shaped lattice.
According to the second aspect of the present invention, an apparatus for encoding a sound signal in a plurality of signal frames each having a pitch period indicative of the sound signal in the respective signal frame, wherein each signal frame comprises a plurality of signal segments each representing a dimension in a pitch space, and the sound signal in each of the signal segments is characterized by a pitch value, and wherein the pitch values are representable by a point distribution pattern characteristic of the sound signal in a lattice structure for defining codebook indices in the pitch space, and the lattice structure is shaped based on the point distribution pattern for defining a shaped lattice structure, said apparatus comprising:
means, responsive to the sound signal, for obtaining an open-loop estimate of the pitch period by an open-loop search from the shaped lattice structure considering all of the dimensions of the pitch space for providing an open-loop search value indicative of the open-loop estimate; and
means, responsive to the open-loop search value, for refining the open-loop estimate in each of the dimensions in the pitch space separately by a closed-loop search from the shaped lattice structure for obtaining a closed-loop search value indicative of the pitch value in the respective signal segment.
According to the third aspect of the present invention, a system for coding a sound signal in a plurality of signal frames each having a pitch period indicative of the sound signal in the respective signal frame, wherein each signal frame comprises a plurality of signal segments each representing a dimension in a pitch space, and the sound signal in each of the signal segments is characterized by a pitch value, and wherein the pitch values are representable by a point distribution pattern characteristic of the sound signal in a lattice structure for defining codebook indices in the pitch space, and the lattice structure is shaped based on the point distribution pattern for defining a shaped lattice structure, said system comprising:
an encoder having:
means, responsive to the sound signal, for obtaining an open-loop estimate of the pitch period by an open-loop search from the shaped lattice considering all of the dimensions of the pitch space for providing an open-loop search value indicative of the open-loop estimate; and
means, responsive to the open-loop search value, for refining the open-loop estimate in each of the dimensions in the pitch space separately by a closed-loop search from the shaped lattice structure for obtaining a closed-loop search value indicative of the pitch value in the respective signal segment for providing information indicative of the shaped lattice structure and the codebook indices; and
a decoder having means, responsive to the information, for synthesizing a further sound signal from the codebook indices based on the shaped lattice structure.
The present invention will become apparent upon reading the description taken in conjunction with FIGS. 2 to 6.