Audio communication, or speech communication, is defined by many different standards and can be using in different networks such as Public Switched Telephone Networks (PSTN), second and third generation of telecommunication systems (2G and 3G systems), Third Generation Partnership Project (3GPP TS 24.173) over High Speed Packet Access, Fourth Generation (4G) using Voice over Long Term Evolution (VoLTE) speech and audio communication, Voice over Internet Protocol (VoIP) communications.
In audio communication, an audio signal is picked up, or captured, by a microphone, amplified to a desired level, filtered, digitally sampled, processed to remove acoustic echo, compensate for electrical and mechanical acoustic characteristic response, or reduce background noises. Then, the signal is encoded to reduce the bitrate once transmitted over the transmission channel, generally a radio channel or a wire, before the signal is received by a distant terminal and processed and played back over an earpiece, headset or loudspeaker.
During play back, it has been found that users have personal preferences when it comes to noise suppression levels. For example, some persons like to have a minimum of background noise when listening to other people, even if the noise suppression algorithm affects the speech quality with clipping and codec artifacts and sound distortion. Some other people like to have a little noise in the background. For example, with a little noise in the background, it may be avoided that there is complete silence when a distant person is not talking. Such complete silence may give the listening user the impression that communication is broken. Some other people do like to have all the background noise information so they understand the context of the other person and experience no artifacts nor clipping on the speech.
The user's preference for the noise suppression level may also be different in different situations. For example, when user A is calling user B who is attending a football game, then user A may want little noise suppression to get a better experience of the atmosphere at the stadium. Another example is when user A is calling user B who is in a very noisy factory. In this case, the intelligibility of speech of user B may be very poor due to a high noise level from activities in the factory.
It has also been observed that the user's preference is biased by the cultural environment of the user, E.g. US mobile operators seem to favor aggressive noise suppression. This may mean that for each individual user there may be an individual preference of when the noise suppression is perceived as optimal in that particular user's point of view. Typically, a quality of the audio, as perceived by a particular user, increases with increasingly more aggressive noise suppression. However, at some point, there will be cuts, or interruptions in the audio. Then, the particular user will consider the quality of the audio to decrease.
Similarly to how quality of audio varies with noise suppression level, quality of audio varies with audio bandwidth and speech level. Users, in particular when elderly and slightly hearing impaired, find it more intelligible when the speech signal is within certain frequency limits and has certain levels. However, recent advances in terminal acoustics and speech coding, e.g. Adaptive Multi-Rate Wideband (AMR-WB), allow for larger audio bandwidths to be represented.
In order to adopt the audio to the user's preference, it has been proposed to apply the user's preference to the audio before the audio is played back to the user. WO2009/113926 discloses a known solution for providing selective control of buffering of at least one media stream. According to the known solution, a communication device includes a jitter buffer and a jitter buffer control unit. The jitter buffer control unit set a buffer strategy based on an instruction, originating from a user input. Then, a data stream is received and buffered, in the jitter buffer, based on the buffer strategy. In relation to for example the above noted individual or cultural user preferences, a disadvantage of the known solution may be that the user nevertheless sometimes is dissatisfied with the perceived quality of the data stream.