Commercial enterprises frequently use contact centers to provide services such as technical support to customers. A caller typically initiates a contact center transaction by dialing a manufacturer-provided telephone number for help with a problem. After answering certain questions posed by an interactive voice response system, the caller is typically connected to a particular contact center agent based on the caller's answers. The contact center agent begins a dialog with the caller and hopefully resolves the caller's problem.
Not long ago a supervisor may have monitored contact center agents by physically walking in proximity to an agent's station and listening to at least the agent's side of the call. Alternately, the supervisor may have had the ability to listen in on any particular agent's conversation via telecommunications equipment, such as a multi-line telephone, which allowed the supervisor to select a particular agent's telephone line. However, today the supervisor typically uses computer tools that monitor the state of each of the agents (e.g., time in a call, type of transaction) and then listens in to agent calls if they exceed certain parameters. Also, it is increasingly likely that contact center agents are located in a different location from their supervisor, making it more difficult for a supervisor to monitor contact center transactions. Both situations inhibit the ability of the supervisor to listen to multiple agents concurrently to detect difficulties or stress in their interactions as an indicator of their need for attention.
Humans detect a location of a sound through differences in the phase and frequency of sound waves that are received simultaneously by each ear. Essentially, in the real world, all sounds are stereo, or multi-channel. This ability to distinguish distance and direction of a sound enables a human to concurrently process multiple sounds simultaneously. For example, when supervisors strolled around cubicles in the past, they could simultaneously hear multiple agents speaking, and discern which agent said what based on the direction and distance of the agent's voice. Unlike in nature, most electronic voice communications are single-channel, or monaural, and thus there is no ability for a listener to distinguish a particular location, or position, of a voice in relation to their own perspective. This is apparent, for example, in a voice conference where a listener may hear multiple individuals speaking concurrently, and to the listener, each individual sounds as if they are at the same location. Studies have shown it is difficult for the human brain to assimilate multiple concurrent conversations in a monaural environment.
A stereo headset, or multiple speakers, in conjunction with multi-channel audio signals enables different audio signals to be received by each ear. Multi-channel sound is used in certain applications, such as computer gaming environments and virtual reality environments, to enable a participant in such environments to generally distinguish the location of sounds occurring during participation in the environment.
It would greatly improve supervisor monitoring of contact center agents if a contact center supervisor could concurrently listen to multiple agents via a communication channel, without leaving the supervisor's desk, such that each agent's voice was perceived to be originating from a unique position with respect to the supervisor. It would also greatly improve supervisor monitoring if the supervisor could easily and intuitively position each agent's voice, and volume, at a desired location with respect to the supervisor.