Known is a system such as a server-client voice recognition system in which a server device performs a predetermined process to the voice input to a client terminal. In this type of system, to reduce the amount of communication from the client terminal to the server device, the client terminal executes a process of detecting a voice segment that is voice part uttered by a person from the input signals, and transmitting only signals corresponding to the detected voice segment to the server device. However, because the client terminal has limited resources compared with the server device, the client terminal is often incapable of detecting a voice segment at a sufficient accuracy, and some voice may be left out without being transmitted. To address this issue, there has been a demand for the development of a new mechanism for reducing the voice left out without being transmitted while suppressing the amount of communication at the same time.