1. Field of the Invention
The present invention relates to a sensor network system such as a microphone array network system which is provided for acquiring a speech of a high sound quality and a communication method therefor.
2. Description of the Related Art
Conventionally, in an application system (e.g., an audio teleconference system in which a plurality of microphones are connected, a speech recognition robot system, a system having various speech interfaces), which utilizes a vocal sound, various speech processing practices of speech source localization, speech source separation, noise cancellation, echo cancellation and so on are performed to utilize the vocal sound with a high sound quality. In particular, microphone arrays mainly intended for the processing of speech source localization and speech source separation are broadly researched for the purpose of acquiring a vocal sound with a high sound quality. In this case, the speech source localization specifies the direction and position of a speech source from sound arrival time differences, and the speech source separation is to extract a specific speech source in a specific direction by erasing sound sources that become noises by utilizing the results of speech source localization.
It has been known that the speech processing using microphone arrays normally improves its speech processing performance of noise processing and the like with an increased number of microphones. Moreover, in such speech processing, there is a number of speech source localization techniques using the position information of a speech source (See, for example, a Non-Patent Document 1). The speech processing becomes more effective as the results of speech source localization have better accuracy. In other words, it is required to concurrently improve the accuracy of the speech source localization and the noise cancellation intended for higher sound quality by increasing the number of microphones.
In a speech source localization method using a conventional large-scale microphone array, the positional range of a speech source is divided into positional ranges in a shape of mesh, and the speech source positions are stochastically calculated for respective intervals. For this calculation, there has been the practice of collecting all speech data in a speech processing server such as a work stations in one place and collectively processing all the speech data to estimate the position of the speech source (See, for example, a Non-Patent Document 2). In the case of the collective processing of all speech data as described above, the signal wiring length and communication traffic between the microphones for vocal sound collection and the speech processing server, and the calculation amount in the speech processing server have been vast. There is such a problem that the microphones cannot be increased in number due to the following:
(a) the increase in the wiring length, the communication traffic and the calculation amount in the speech processing server, and;
(b) such a physical limitation that a number of A/D converters cannot be arranged in one place of the speech processing server.
Moreover, there is also such a problem of occurrence of noises due to the increase in the signal wiring length. Therefore, there occurred a problem of difficulties in increasing the number of microphones intended for higher sound quality.
As a method for making improvements concerning the above problems, there has been known a speech processing system with a microphone array in which a plurality of microphones are grouped into small arrays and they are aggregated (See, for example, a Non-Patent Document 3). However, even in such a speech processing system, the speech data of all the microphones obtained in small arrays are aggregated into the speech server in one place via a network, and therefore, this leads to a problem of increase in the communication traffic of the network. Moreover, there is such a problem that a speech processing delay occurs in accordance with the increase in the communication data amount and the communication traffic amount.
Moreover, in order to satisfy demands for sound pickup in a ubiquitous system and a television conference system in the future, a greater number of microphones are necessary (See, for example, the Patent Document 1). However, in the current network system with a microphone array as described above, the speech data obtained by the microphone array is merely transmitted to the server as it is. We found out no system in which node devices of a microphone array mutually exchange position information of the speech source to reduce the calculation amount of the calculation amount in the entire system and reduce the communication traffic of the network. Therefore, a system architecture becomes important which reduces the calculation amount of the entire system and suppresses the communication traffic of the network by assuming an increase in the scale of the microphone array network system.
As described above, it has been demanded to improve the speech source localization accuracy by using a number of microphone arrays with suppressing the communication traffic and the calculation amount in the speech processing server and to effectively perform the speech processing of noise cancellation and so on. Moreover, a position measurement system using a speech source is proposed in these latter days. For example, the Patent Document 2 discloses computation of an ultrasonic tag by using an ultrasonic tag and a microphone array. Further, the Patent Document 3 discloses sound pickup by using a microphone array.
Prior art documents related to the present invention are as follows: