There has been known a server-client type voice recognition system that transmits a signal including voice (human speech) from a client device to a server device, and that returns a recognition result to the client device after performing voice recognition in the server device. In this type of voice recognition system, to reduce the communication volume from the client device to the server device, a process has been developed that detects a section of voice (hereinafter, referred to as a speech section) from a signal received by the client side, and only transmits a signal in the detected speech section to the server device.
In detecting the speech section in the client device, it is difficult to accurately detect the speech section, because resources of the client device are limited compared to those of the server device. Furthermore, because the voice state of the client side differs according to the environment and changes often, there is a possibility of not being able to collect spoken voice. Consequently, there is a demand for accurately detecting the speech section while reducing the communication volume.