Traditionally, speech communication between remote users has been provided through a direct two way communication using dedicated devices at each end. Specifically, traditional communication between two users has been via a wired telephone connection or a wireless radio connection between two radio transceivers. However, in the last decades, the variety and possibilities for capturing and communicating speech has increased substantially and a number of new services and speech applications have been developed, including more flexible speech communication applications.
For example, the widespread acceptance of broadband Internet connectivity has led to new ways of communication. Internet telephony has significantly lowered the cost of communication. This, combined with the trend of families and friends to be spread around the globe, has resulted in phone conversations lasting for long durations. VoIP (Voice over Internet Protocol) calls lasting for longer than an hour are not uncommon, and user comfort during such long calls is now more important than ever.
In addition, the range of devices owned and used by a user has increased substantially. Specifically, devices equipped with audio capture and typically wireless transmission are becoming increasingly common, such as e.g., mobile phones, tablet computers, notebooks, etc.
The quality of most speech applications is highly dependent on the quality of the captured speech. Accordingly, most practical applications are based on positioning a microphone close to the mouth of the speaker. For example, mobile phones include a microphone which when in use is positioned close the user's mouth by the user. However, such an approach may be impractical in many scenarios and may provide a user experience which is less than optimal. For example, it may be impractical for a user to have to hold a tablet computer close to the head.
In order to provide a freer and more flexible user experience, various hands free solutions have been proposed. These include wireless microphones which are comprised in very small enclosures that may be worn and e.g. attached to the user's clothes. However, this is still perceived to be inconvenient in many scenarios. Indeed, enabling hands-free communication with the freedom to move and multi-task during a call, but without having to be close to a device or to wear a headset, is an important step towards improved user experience.
Another approach is to use hands free communication based on a microphone being positioned further away from the user. For example, conference systems have been developed which when positioned e.g. on a table will pick-up speakers located around the room. However, such systems tend to not always provide optimum speech quality, and in particular the speech from more distant users tends to be weak and noisy. Also, the captured speech will in such scenarios tend to have a high degree of reverberation which may reduce the intelligibility of the speech substantially.
It has been proposed to use more than one microphone for e.g. such teleconferencing systems. However, a problem in such cases is that of how to combine the plurality of microphone signals. A conventional approach is to simply sum the signals together. However, this tends to provide suboptimal speech quality. Various more complex approaches have been proposed, such as performing a weighted summation based on the relative signal levels of the microphone signals. However, the approaches tend to provide suboptimal performance in many scenarios, such as e.g. still including a high degree of reverberation, being sensitive to absolute levels, being complex, requiring centralized access to all microphone signals, being relatively impractical, requiring dedicated devices etc.
Hence, an improved approach for capturing speech signals would be advantageous and in particular an approach allowing increased flexibility, improved speech quality, reduced reverberation, reduced complexity, reduced communication requirements, increased adaptability for different devices (including multifunction devices), reduced resource demand and/or improved performance would be advantageous.