The present invention generally relates to voice detection apparatuses, and more particularly to a voice detection apparatus for detecting voiced and silent intervals of a voice signal.
Recently, there are increased demands to design a communication system which can make an efficient data transmission by use of a high-speed channel such as a high-speed packet and ATM. In such a communication system, the data transmission is controlled depending on the existence of the voice signal so as to realize the efficient data transmission. For example, a control is carried out to compress the transmission data quantity by not transmitting the signal in the voiceless interval of the voice signal. Accordingly, in order to realize the efficient data transmission, it is essential that the voiced and silent intervals of the voice signal are detected by a voice detection apparatus with a high accuracy.
FIG. 1 shows an example of a conventional voice detection apparatus which comprises a signal power calculation part 1, a zero crossing counting part 2 and a discriminating part 3. The signal power calculation part 1 extracts a voice signal for every frame and calculates a voice signal power. The zero crossing counting part 2 counts a number of times the polarity of the voice signal is inverted. The discriminating part 3 discriminates voiced and silent intervals of the voice signal based on outputs of the signal power calculation part 1 and the zero crossing counting part 2.
FIG. 2 is a flow chart for explaining the operation of the discriminating part 3 of the voice detection apparatus. A step S0 discriminates whether or not a voice signal power SP calculated in the signal power calculation part 1 is greater than a threshold value SP.sub.th. When the discrimination result in the step S0 is YES, a voiced interval is detected and a step S1 sets the threshold value SP.sub.th to SP.sub.th =SP.sub.th2 and the process returns to the step S0. On the other hand, when the discrimination result in the step S0 is NO, a step S2 compares a zero crossing number ZC which is counted in the zero crossing counting part 2 with threshold values ZC.sub.v and ZC.sub.f.
FIG. 3 shows a relationship of the threshold values ZC.sub.v and ZC.sub.f, the voiced interval (voiced and voiceless sounds) and the silent interval (noise). It is known that the silent interval occurs only when ZC.sub.v &lt;ZC&lt;ZC.sub.f. Accordingly, when ZC&gt;ZC.sub.f and ZC&lt;ZC.sub.v and the voiced interval is detected in the step S2, the process returns to the step S0 via the step S1. However, when ZC.sub.f &gt;ZC&gt;ZC.sub.v and the silent interval is detected in the step S2, a step S3 sets the threshold value SP.sub.th to SP.sub.th =SP.sub.th1 and the process returns to the step S0.
FIG. 4 shows a relationship of the threshold values SP.sub.th1 and SP.sub.th2. A hysteresis characteristic is given to the threshold values at the times when the voiced and silent intervals are detected, and the threshold value is set to SP.sub.th1 for the transition from the silent interval to the voiced interval and the threshold value is set to SP.sub.th2 for the transition from the voiced interval to the silent interval, so that no chattering is generated in the detection result.
However, the response of this conventional voice detection apparatus is poor because the voiced and silent intervals are detected based solely on the signal power and the zero crossing number. For this reason, there is a problem in that a beginning of speech and an end of speech cannot be detected accurately.
In order to eliminate this problem, the conventional voice detection apparatus stores the voice signal for a predetermined time, and the stored data is read out when the voiced interval is detected so as to avoid a dropout at the beginning of the speech. In addition, in the case of the end of speech, the voiced interval is deliberately continued for a predetermined time so as to eliminate a dropout at the end of speech. But because a delay element is provided to prevent the dropout of the voice data, there are problems in that a delay is inevitably introduced in the voice detection operation and the provision of the delay element is undesirable when considering the structure of a coder which is used in the voice detection apparatus.