Comfort noise is known in the field of telecommunications, and is used to add noise when there is cessation or reduction in data transmission during times when there is no active speech present, e.g., when discontinuous transmission (DTX) is used. Without comfort noise, such a “dead” segment of complete silence typically creates the sense of loss or absence of a far end presence, which can be disconcerting to a listener. Adding comfort noise as a synthetic or statistical noise to fill in the absence of a significant signal in an audio stream due to DTX or other audio processing creates a more perceptually continuous audio stream.
A voice conferencing system, including the voice portion of a video conferencing system, e.g., of a telepresence system, allows a possibly large number of participants to communicate by voice simultaneously. Handling DTX by adding comfort noise in such a system can be complicated. A typical system might limit the buildup of noise or comfort noise from the incoming streams by switching or selecting a subset of the active audio streams, and only mixing the selected portions together. This may work for a simple mono conference bridge, but is not ideal in many cases.
Some conferencing systems make use the spatial properties of the audio, which additionally complicates using comfort noise, e.g., causing difficulty maintaining continuity between the intended and synthetic audio segments.
This invention presents a system design to create a sense of presence at a spatial audio conferencing endpoint (also called a spatial audio conferencing client) by adding spatial comfort noise comprising a plurality of spatial noise signals that have spectral properties, e.g., amplitude-metric spectra such as power spectra, which are typical of comfort noise, and at least one spatial property that substantially matches at least one target spatial property.
A typical conferencing system includes a conference server to which endpoints are coupled. Several conferencing architectures are known, e.g., centralized control, endpoint-mixing, full-mesh architecture, and multicast architecture. For each of these, what is called herein a conference server is the single entity, or is the functional combination of a set of distributed entities, that carries out control. One example is a multipoint control unit (MCU), a device commonly used to bridge the conferencing by mixing the audio (or audiovisual) streams.
One possible approach to conferencing includes a restricted set of the active audio streams being retained and forwarded from a conference server. Such an approach avoids the buildup or potential for excessive comfort noise by attrition occurring at the server. This might be problematic in a conferencing system in which the default action of the server is the combination of or the combined processing of several streams. In such a system, no audio is dropped, and therefore, there is an issue of how to manage the intended comfort noise from all incoming streams.
The present invention provides a way to achieve the desired perceptual continuity offered by comfort noise by carrying out processing at a receiving, i.e., listening endpoint, whilst avoiding the complications of managing comfort noise from the set of individual streams that may be heard by the receiving client.