As a conventional signal processing apparatus that extracts a couple of people in conversation, there is an apparatus that extracts effective speech by judging the degree of establishment of a conversation based on a correlation between pieces of time sequence data of a speech signal through voice/silence evaluation (see patent literature 1).
The signal processing apparatus described in patent literature 1 uses a phenomenon that speech appears alternately between two excitations in an established conversation to perform voice/silence evaluation of the separated excitation signals and calculates the degree of establishment of a conversation according to a combination of voice/silence between the two excitations. FIG. 1 shows the concept of the method of calculating the degree of establishment of a conversation described in patent literature 1. When one of a target speech signal and a received signal is voice and the other is silence, points are added to the degree of establishment of a conversation, whereas when both signals are voice or silence, points are deducted. A conversation is assumed to be established for a combination of excitations having a large degree of establishment of a conversation.