Voice communications, such as conference calls over a network, wherein the participants are located in multiple geographic areas (referred to as “multi-geo” communications), may be difficult for the participants due to factors, such as a lack of familiarity with voices, customs, names and accents, as well as low audio quality due to, for example, poor or intermittent connections and/or problems with equipment. As a result, it may be difficult for one or more participants to identify and understand attendees during these communications. Additionally, people who participate in a call may often hesitate to inform the other participants about their inability to understand the other callers, due to for example, embarrassment, or fear of offending the other participants.
Accordingly, there is a need to detect situations when participants in a voice communication do not comprehend what is being spoken and to provide solutions to improve comprehension and the overall quality of voice communications, such as conference calls.