1. Technical Field
The present invention relates generally to speech coding; and, more particularly, it relates to long term pre-processing of speech coding without any delay.
2. Related Art
Conventional long term (LT) pre-processing in a code-excited linear prediction speech coding saves a number of bits to code a pitch lag of a speech signal, but the conventional methods to perform long term (LT) pre-processing inherently introduces a variable delay at an end of a speech frame of the speech signal. No conventional speech coding method provides any way to perform long term (LT) pre-processing to code the pitch lag of a speech signal without performing some form of extra-delay at an end of a speech frame.
Further limitations and disadvantages of conventional and traditional systems will become apparent to one of skill in the art through comparison of such systems with the present invention as set forth in the remainder of the present application with reference to the drawings.
Various aspects of the present invention can be found in a speech codec having a pitch track coding circuitry that operates on a speech signal. The pitch track coding circuitry of the speech codec itself contains, among other things, a pitch lag selection circuitry and a residual (or weighted speech) modification and warping circuitry. The pitch lag selection circuitry selects an end-of-frame pitch lag. The end-of-frame pitch lag is selected from a speech frame of the speech signal. The first pitch lag determines a global pitch track for the speech frame using the end-of-frame pitch lag. The residual (or weighted speech) modification and warping circuitry adjusts a local pitch track of the speech frame on a speech sub-frame basis. The sub-frame size could be variable. The speech signal contains a number of speech frames. Each speech frame of the number of speech frames itself contains a number of speech sub-frames. Each speech sub-frame of the number of speech sub-frames has a corresponding pitch lag. The residual modification and warping circuitry adjusts the corresponding pitch lag.
In certain embodiments of the invention, a speech coding residual is received by the pitch lag selection circuitry. The speech coding residual is used to calculate an open-loop pitch, and the open-loop pitch is used to select the end-of-frame pitch lag. If desired, the end-of-frame pitch lag is searched by maximizing a long term processing gain of the speech frame of the speech signal. In this embodiment of the invention, the end-of-frame pitch lag is searched by favoring a long term processing gain close to an end of the speech frame of the speech signal. In other embodiments of the invention, each speech frame of the number of speech frames of the speech signal contains two end-points, and the end-points of each of the speech frames are not adjusted by the residual modification and warping circuitry. Also, each speech frame of the plurality of speech frames of the speech signal contains a number of internal-points. The corresponding pitch lags of the number of speech sub-frames of the number of speech frames of the speech signal is a pitch lag corresponding to one of the internal-points. The pitch lag corresponding to one of the plurality of internal-points is adjusted using the residual modification and warping circuitry. In addition, a long term processing gain for all the speech sub-frames of the speech frame of the speech signal is maximized to assist in the determination of the adjustment of the at least one of the corresponding pitch lags of the number of speech sub-frames of the number of speech frames of the speech signal by the residual modification and warping circuitry. In certain embodiments of the invention, more than one pitch lag of the number of speech signal of the number of speech frames of the speech signal is adjusted using the residual modification and warping circuitry. The adjustment at the end of the frame is kept to zero. The speech codec of the invention contains an encoder circuitry, and the adjustment of the pitch lags of the number of speech sub-frames of the number of speech frames of the speech signal is performed exclusively in an encoder circuitry of the speech codec.
Other aspects of the present invention can be found in a speech codec having a pitch track coding circuitry that operates on a speech signal. In this embodiment of the invention, the speech codec contains a pitch lag selection circuitry and a residual modification and warping circuitry. The pitch lag selection circuitry selects a first pitch lag for a speech frame of the speech signal. The first pitch lag determines a global pitch track for the speech frame. The residual modification and warping circuitry adjusts a local pitch track of the speech frame on a speech sub-frame basis. The local pitch track of the speech frame is adjusted by modifying and warping a selected number of points within the speech frame.
In certain embodiments of the invention, the speech codec contains an encoder circuitry, and the adjustment of the pitch lags of the plurality of the number of speech sub-frames of the number of speech frames of the speech signal is performed exclusively in the encoder circuitry of the speech codec. Each speech frame of the number of speech frames of the speech signal has two end-points. The end-points of each of the speech frames are not adjusted by the residual modification and warping circuitry. The selected first pitch lag for the speech frame of the speech signal is selected by maximizing a long term processing gain of the speech frame of the speech signal and by favoring a long term processing gain close to an end of the speech frame of the speech signal. The total adjustment of the selected plurality of points within the speech frame sums to zero.
Other aspects of the present invention can be found in a method that modifies and warps a speech coding residual of a speech signal (or weighted speech signal). The method includes calculating the speech coding residual of the speech signal so that the speech coding residual contains an initial estimate of pitch track. In addition, the method includes determining an initial estimate for a pitch track of the speech signal, and modifying and warping the speech coding residual to provide a better fit of the pitch track of the speech coding residual.
In certain embodiments of the invention that perform the method, the speech signal contains a number of speech frames. Each speech frame of the speech signal contains a plurality of speech sub-frames. The step of the method that determined the initial estimate for the pitch track of the speech signal further includes maximizing a long term processing gain for the number of speech frames of the speech signal. In doing this, a long term processing gain close to an end of the speech frame of the speech signal is favored. In other embodiments of the invention, the modification and warping of the speech coding residual to provide the better fit of the pitch track of the speech coding residual further includes maximizing a long term processing gain of the plurality of speech sub-frames of the speech signal. In doing this, each speech frame of the number of speech frames of the speech signal has two end-points. The end-points of each of the speech frames are not modified and warped to provide a better fit of the pitch track of the speech coding residual.
Other aspects, advantages and novel features of the present invention will become apparent from the following detailed description of the invention when considered in conjunction with the accompanying drawings.