The invention relates to communication systems and more particularly to multitalker communication systems using spatial processing.
In communications tasks that involve more than one simultaneous talker, substantial benefits in overall listening intelligibility can be obtained by digitally processing the individual speech signals to make them appear to originate from talkers at different spatial locations relative to the listener. In all cases, these intelligibility benefits require a binaural communication system that is capable of independently manipulating the audio signals presented to the listener's left and right ears. In situations that involve three or fewer speech channels, most of the benefits of spatial separation can be achieved simply by presenting the talkers in the left ear alone, the right ear alone, or in both ears simultaneously. However, many complex tasks, including air traffic control, military command and control, electronic surveillance, and emergency service dispatching require listeners to monitor more than three simultaneous systems. Systems designed to address the needs of these challenging applications require the spatial separation of more than three simultaneous speech signals and thus necessitate more sophisticated signal-processing techniques that reproduce the binaural cues that normally occur when competing talkers are spatially separated in the real world. This can be achieved through the use of linear digital filters that replicate the linear transformations that occur when audio signals propagate from a distant sound source to the listener's left or right ears. These transformations are generally referred to as head-related transfer functions, or HRTFs. If a sound source is processed with digital filters that match the head related transfer function of the left and right ears and then presented to the listener through stereo headphones, it will appear to originate from the location relative to the listener's head where the head-related transfer function was measured. Prior research has shown that speech intelligibility in multi-channel speech displays is substantially improved when the different competing talkers are processed with head-related transfer function filters for different locations before they are presented to the listener.
In practice, the methods used to implement spatial processing in a multichannel communication system depend on the architecture used in that system. The basic objective of a multichannel communications system is to allow each of N users to choose to listen to any combination of M input communications channels over a designated audio display device (usually a headset). In practice this can be achieved with either of two architectures: a distributed switching architecture or a central switching architecture. FIG. 1 shows an example of a prior art multialker communication system that uses a distributed system architecture. In the FIG. 1 architecture, every high-bandwidth input communications channel (A, B, C and D in this case represented at 100) is connected to a set of N remote switching systems, illustrated at 101, 105 and 106 that are physically located at or near each of the N users of the system. Each user is able to use a control panel, one of which is illustrated at 102 for the remote switching system 101, to select the individual gain levels of each of the M input channels (denoted by gi in the figure and one set which is illustrated at 103), and the input signals are scaled by these gain levels and summed together at 104 before being output to the user's headset.
FIG. 2 shows an example of a prior art multitalker communication system that uses a central switching architecture. In this architecture, the user control panels, illustrated at 200, are remotely connected to the central switching unit 201 with a low bandwidth control signal that allows the user, illustrated at 205, 206 and 207, to select the gains of each output channel, one of which is illustrated at 202. These gains are used to scale and combine the desired speech signals at the location of the central switching unit 201. Then a single high-bandwidth audio signal, one of which is shown at 203 and which occurs for each user, is sent to the remote location of the user and played over headphones 204.
TABLE 1Comparison of Central and Distributed SwitchingDistributed SwitchingCentral SwitchingCentralNoneM * N Multiply andProcessingAccumulatesRemoteM Multiply and AccumulatesNoneProcessing(per Station)Central-RemoteM High-Bandwidth1 High-Bandwidth AudioConnectionsAudio ChannelsChannelRemote-CentralNoneAdjustable gain for eachConnectionschannel
Table 1 compares the advantages and disadvantages of distributed and central switching architecture. In general, a distributed switching architecture like that illustrated in FIG. 1 offers the most flexibility, because it allows each user station to be tailored to the specific needs of that user without changing the architecture of the remainder of the communication system. However, it has two major disadvantages: 1) it requires a large number of high-bandwidth audio signals to be transmitted to the location of each user; and 2) it requires processing power at the location of each user. In contrast, to the distributed switching system, the main advantage of the central switching system like that illustrated in FIG. 2 is that it requires only a single high-bandwidth audio signal to be transmitted from the central switch to each user location. It also concentrates all of the system processing demands into a central unit.
Historically, the costs of physically wiring connections between the locations of remote users and the costs of providing custom switching hardware at the location of each user have made distributed switching systems prohibitively expensive for all systems with more than a handful of possible input communications lines. In the future, however, network protocols such as voice-over art that allow multiple voice channels to be transmitted via a single connection point, combined with inexpensive and widely available DSP processing technology, are likely to make distributed switching the preferred architecture for all but the largest-capacity communications systems. Nevertheless, there is good reason to believe that centrally-switched systems will continue to be used for many years to come, both because they are the only systems capable of handling switching tasks with thousands or millions of users (such as the telephone system) and because many large and expensive systems using central switching architectures are currently in use in applications where they would be difficult or expensive to replace. Also, in some systems there are security issues that make it difficult to directly connect all possible communications channels to every user of the system.
FIG. 3 and FIG. 4 show how spatial separation would be added to systems with distributed or central switching architectures under the prior art as illustrated in FIGS. 1 and 2, respectively. Following along with the description in FIG. 1, FIG. 3 shows spatialized audio implementation with distributed switching. Similarly, following along with the description in FIG. 2, FIG. 4 shows spatialized audio implementation with central switching. This spatial separation in both FIGS. 3 and 4 is achieved by convolving each input speech channel with two separated finite-impulse-response (FIR) filters, hL(t)θ and hR(t)θ. In FIG. 3 the filters are illustrated at 300 and in FIG. 4 the filters are illustrated at 400. The filters will reproduce the amplitude and phases associated with the signals reaching the listener's left and right ears from a sound source at location θ in the horizontal plane. At an 8 kHz sampling rate, these filters would be on the order of 16-32 points long and would therefore require roughly 256K multiply and accumulate operations per second. In addition to controlling the gain gi associated with each input channel, shown collectively at 301 in FIG. 3 and 401 in FIG. 4, the user has the additional option of selecting the location θi of each speech channel. This selection determines which set of head-related transfer function filters will be used to process each speech channel prior to being output to the listener. Also, note that the spatially separated system now needs to do a separate summation for each ear. In FIG. 3 left ear summation is illustrated at 302 and right ear summation is illustrated at 303. In FIG. 4 left ear summation is illustrated at 402 and right ear summation is illustrated at 403. The output is a stereo rather than mono output signal to the user's headset.
While the distributed switching system required for the spatialized communication system shown in FIG. 3 is considerably more complex than the distributed switching system associated with the non-spatial system shown in FIG. 1, it has the advantage of modularity: any one user station could be upgraded to three-dimensional audio without influencing any other aspects of the overall communications system. This is in direct contrast with the centrally switched three-dimensional audio system shown in FIG. 4.
The central-switching implementation of FIG. 4 requires the following extensive changes to the central switch: (i) the communications link from the user's control panel 404 to the central switch 405 must be changed to allow the user to select which head related transfer function filter set to use to process each communications channel; and (ii) the central switch 405 must now execute two variable FIR filters for the left and right output channels of each communication signal for each listener (i.e., M×N FIR filters); and a second full-bandwidth audio signal must be sent from the central switch 405 to the location of the remote user.
While these modifications are certainly possible to implement, considerable cost savings could be achieved if some way could be found to spatially separate speech signals in a centrally switched communication system without modifying the central switching architecture in any way. In addition to providing a method and device for adding spatial audio capabilities to an existing centrally switched communication system without modifying the internal operation of the system, the present invention provides a method and device which increases the computational efficiency of spatial processing for all centrally switched systems with more than a few simultaneous end users.