The present invention relates, in general, to data processing and, in particular, to speech signal processing.
Systems for transmitting speech to a receiver often digitize the speech, divide the digitized speech into frames, encode each frame using a particular voice encoder, or vocoder algorithm, and transmit the frames to a receiver.
Some of the problems encountered by these systems include unnecessary complexity, recognizing background noise as speech when no speech is present, transmitting too many frames that do not contain speech, sending frames encoded using a format other than the chosen vocoder, and so on.
Some speech transmission systems are unnecessarily complex. Such systems tend to be more expensive than simpler systems because of the additional software required to perform a complex function. Also, a complex system may be too slow for a particular purpose because of the additional time required to complete a complex function.
Some speech systems set thresholds for background noise that are based on a theoretical model of noise. Such systems are susceptible to erroneous determinations that speech is present in a frame when it is not because of unanticipated changes in the actual background noise from transmission to transmission. Also, some systems do not adjust the background noise thresholds once set or do not adjust the thresholds often enough to keep pace with a rapidly changing noise background. These same points apply to how systems set the threshold for determining whether or not speech is present within a frame.
Speech transmission systems that send too many frames that do not contain speech waste bandwidth that could have been used to transmit frames that do contain speech and run the risk that the receiver will mistakenly conclude that the transmission is over for lack of any voice activity.
Some speech transmission systems send additional frames (e.g., comfort noise) that are not encoded using the chosen vocoder but are sent using special frames. Using special frames add complexity to the receiver because the receiver must be able to recognize these special frames. Also, special frames may cause bothersome noise in the receiver since the special frames where not encoded using the chosen vocoder algorithm.
U.S. Pat. No. 3,832,491, entitled xe2x80x9cDIGITAL VOICE SWITCH WITH AN ADAPTIVE DIGITALLY-CONTROLLED THRESHOLD,xe2x80x9d discloses a voice switch that adjusts the threshold for determining the presence of speech that is adjusted only after a theoretically optimum threshold is exceeded 1,220 times and adjusts a minimum speech threshold based on noise. U.S. Pat. No. 3,832,491 does not perform the steps of the present invention and does not adjust the speech threshold in the same manner, or as often, as does the present invention. U.S. Pat. No. 3,832,491 is hereby incorporated by reference into the specification of the present invention.
U.S. Pat. No. 4,008,375, entitled xe2x80x9cDIGITAL VOICE SWITCH FOR SINGLE OR MULTIPLE CHANNEL APPLICATIONS,xe2x80x9d discloses a voice switch that adjusts the threshold for determining the presence of speech based on a statistical analysis of whether or not the number of times the speech threshold is exceeded is uniform or non-uniform. U.S. Pat. No. 4,008,375 does not perform the steps of the present invention and does not adjust the speech threshold as often as does the present invention. U.S. Pat. No. 4,008,375 is hereby incorporated by reference into the specification of the present invention.
U.S. Pat. Nos. 5,612,955, entitled xe2x80x9cMOBILE RADIO WITH TRANSMIT COMMAND CONTROL AND MOBILE RADIO SYSTEMxe2x80x9d; U.S. Pat. No. 5,812,965, entitled xe2x80x9cPROCESS AND DEVICE FOR CREATING COMFORT NOISE IN A DIGITAL SPEECH TRANSMISSIONxe2x80x9d; and U.S. Pat. No. 5,835,889, entitled xe2x80x9cMETHOD AND APPARATUS FOR DETECTING HANGOVER PERIODS IN A TDMA WIRELESS COMMUNICATION SYSTEM USING DISCONTINUOUS TRANSMISSIONxe2x80x9d each transmit a special silence descriptor (SID) frame when silence is encountered and the transmission of speech is discontinued. This special frame may cause bothersome noise at the receiver whereas the method of the present invention does not. U.S. Pat. Nos. 5,612,955; 5,812,965; and 5,835,889 are hereby incorporated by reference into the specification of the present invention.
U.S. Pat. No. 4,351,983, entitled xe2x80x9cSPEECH DETECTOR WITH VARIABLE THRESHOLD,xe2x80x9d discloses a device for and method of detecting speech by adjusting the threshold for determining speech, but does not do so as does the present invention. Also, U.S. Pat. No. 4,351,983 does not employ comfort noise and discontinuous transmission as does the present invention. U.S. Pat. No. 4,351,983 is hereby incorporated by reference into the specification of the present invention.
U.S. Pat. No. 4,672,669, entitled xe2x80x9cVOICE ACTIVITY DETECTION PROCESS AND MEANS FOR IMPLEMENTING SAID PROCESS,xe2x80x9d discloses advice for and method of detecting voice activity by comparing the energy of a signal to a threshold. The signal is determined to be voice if its power is above the threshold. If its power is below the threshold then the rate of change of the spectral parameters is tested. U.S. Pat. No. 4,672,669 does not employ, comfort noise of discontinuous transmission as does the present invention. U.S. Pat. No. 4,672,669 is hereby incorporated by reference into the specification of the present invention.
U.S. Pat. No. 5,255,340, entitled xe2x80x9cMETHOD FOR DETECTING VOICE PRESENCE ON A COMMUNICATION LINE,xe2x80x9d discloses a method of detecting voice activity by determining the stationary or non-stationary state of a block of the signal and comparing the result to the results of the last M blocks and does not employ the steps of the present method. U.S. Pat. No. 5,255,340 is hereby incorporated by reference into the specification of the present invention.
U.S. Pat. No. 5,276,765, entitled xe2x80x9cVOICE ACTIVITY DETECTION,xe2x80x9d discloses a device for and a method of detecting voice activity by performing an autocorrelation on weighted and combined coefficients of the input signal to provide a measure that depends on the power of the signal. The measure is then compared against a variable threshold to determine voice activity. However, the speech threshold is not adjusted during speech periods as in the present invention. U.S. Pat. No. 5,276,765 is hereby incorporated by reference into the specification of the present invention.
U.S. Pat. Nos. 5,459,814 and 5,649,055, both entitled xe2x80x9cVOICE ACTIVITY DETECTOR FOR SPEECH SIGNALS IN VARIABLE BACKGROUND NOISE,xe2x80x9d discloses a device for and method of detecting voice activity by measuring short term time domain characteristics of the input signal, including the average,signal level and the absolute value of any change in average signal level and not the steps of the present method. U.S. Pat. Nos. 5,459,814 and 5,649,055 are hereby incorporated by reference into the specification of the present invention.
U.S. Pat. Nos. 5,533,118 and 5,619,565, both entitled xe2x80x9cVOICE ACTIVITY DETECTION METHOD AND APPARATUS USING THE SAME,xe2x80x9d discloses a device for and method of distinguishing voice activity from two tones by dividing the square of the maximum value of the received signal by its energy and comparing this ratio to three different thresholds and not the steps of the present method. U.S. Pat. Nos. 5,533,118 and 5,619,565 are hereby incorporated by reference into the specification of the present invention.
U.S. Pat. Nos. 5,598,466 and 5,737,407, both entitled xe2x80x9cVOICE ACTIVITY DETECTOR FOR HALF-DUPLEX AUDIO COMMUNICATION SYSTEM,xe2x80x9d discloses a device for and method of detecting voice activity by determining an average peak value, a standard deviation, updating a power density function, and detecting voice activity if the average peak value exceeds the power density function and not the steps of the present method. U.S. Pat. Nos. 5,598,466 and 5,737,407 are hereby incorporated by reference into the specification of the present invention.
U.S. Pat. No. 5,619,566, entitled xe2x80x9cVOICE ACTIVITY DETECTOR FOR AN ECHO SUPPRESSOR AND AN ECHO SUPPRESSOR,xe2x80x9d discloses a device for detecting voice activity that includes a whitening filter, a means for measuring energy, and using the energy level to determine the presence of voice activity and not the steps of the present method. U.S. Pat. No. 5,619,566 is hereby incorporated by reference into the specification of the present invention.
U.S. Pat. No. 5,732,141, entitled xe2x80x9cDETECTING VOICE ACTIVITY,xe2x80x9d discloses a device for and method of detecting voice activity by computing the autocorrelation coefficients of a signal, identifying a first autocorrelation vector, identifying a second autocorrelation vector, subtracting the first autocorrelation vector from the second autocorrelation vector, and computing a norm of the differentiation vector which indicates whether or not voice activity is present and not the steps of the present method. U.S. Pat. No. 5,732,141 is hereby incorporated by reference into the specification of the present invention.
U.S. Pat. No. 5,749,067, entitled xe2x80x9cVOICE ACTIVITY DETECTOR,xe2x80x9d discloses a device for and method of detecting voice activity by comparing the spectrum of the a signal to a noise estimate, updating the noise estimate, computing a linear predictive coding prediction gain, and suppressing updating the noise estimate if the gain exceeds a threshold and not the steps of the present method. U.S. Pat. No. 5,749,067 is hereby incorporated by reference into the specification of the present invention.
U.S. Pat. No. 5,867,574, entitled xe2x80x9cVOICE ACTIVITY DETECTION SYSTEM AND METHOD,xe2x80x9d discloses a device for and method of detecting voice activity by computing an energy term based on an integral of the absolute value of a derivative of a speech signal, computing a ratio of the energy to a noise level, and comparing the ratio to a voice activity threshold and not the steps of the present method. U.S. Pat. No. 5,867,574 is hereby incorporated by reference into the specification of the present invention.
It is an object of the present invention to transmit encoded frames of digitized speech.
It is another object of the present invention to. transmit encoded comfort noise after a user-definable number of frames have been detected that do not contain speech.
It is another object of the present invention to discontinue transmission after a user-definable number of frames are detected that do not contain speech.
It is another object of the present invention to resume transmission after transmission has been discontinued upon the detection of a frame containing speech.
It is another object of the present invention to adjust the threshold for determining the presence of speech based on the energy of the frame on a frame by frame basis.
It is another object of the present invention to adjust a minimum energy threshold on a frame by frame basis.
It is another object of the present invention to adjust a maximum energy threshold on a frame by frame basis.
The present invention is a method of transmitting speech.
The first step is setting a silence counter to zero.
The second step is setting a transmit counter to one.
The third step is setting a blank period counter to zero.
The fourth step is receiving a frame of digitized information that may or may not contain speech.
The fifth step is determining if the frame contains speech.
The sixth step is checking if the transmit counter is equal to zero and the blank period counter is less than x, where x is a positive integer.
The seventh step is checking if the transmit counter is equal to zero, the blank period counter is greater than xxe2x88x921, and the frame does not contain speech.
The eighth step is checking if the transmit counter is equal to zero, the blank period counter is greater than xxe2x88x921, and the frame contains speech.
The ninth step is checking if the transmit counter is equal to one, the frame does not contain speech, and the silence counter is less than y.
The tenth step is checking if the transmit counter is equal to one, the frame does not contain speech, and the silence counter is greater than y+zxe2x88x922, where y and z are both positive integers.
The eleventh step is checking if the transmit counter is equal to one, the frame does not contain speech and the silence counter is greater than yxe2x88x921.
The twelfth, and last, step is checking if the transmit counter is equal to one, the frame contains speech and the silence counter is less than y+z.
In the preferred embodiment, the energy of a frame is calculated using the following equation.
E={square root over ((AHxc3x97A+L )/(FrameSize))}
A minimum energy threshold is set.
A maximum energy threshold is set.
A speech threshold is set as T=(0.07xc3x97maximum energy threshold)+(Kxc3x97minimum energy threshold), where K is a user-definable value.
The energy of the frame is compared to the speech threshold.
If the energy of the frame is less than the speech threshold then concluding that no speech is contained within the frame, otherwise concluding that speech is contained within the frame.
Increasing the minimum energy threshold by a first user-definable percentage.
Additionally, the energy of the frame may be checked to see if it is less than the minimum energy threshold. If so, set the first user-definable percentage to what the first user-definable percentage was set to initially. Also, check if the energy of the frame is greater than the minimum energy threshold. If so then increase the first user-definable percentage by a second user-definable percentage.
In an alternate embodiment, the maximum energy threshold may be modified in a similar, but complementary, fashion as was the minimum energy threshold.